[lustre-devel] [lustre-discuss] Lustre switching to loop back lnet interface when it is not desired
Backer
backer.kolo at gmail.com
Thu Nov 7 09:21:00 PST 2024
While walking through the code, found a lnet module
parameter local_nid_dist_zero. Setting it to 0 resolves the issue. Just
putting it here if anyone searching for the same thing in the future.
On Wed, 6 Nov 2024 at 13:39, Backer <backer.kolo at gmail.com> wrote:
> Hi Chris,
>
> Thank you looking in to this. I agree. In cloud and other type of
> networks on-Prem, floating ip is real thing providing ha and I am
> attempting to make it work. Since ip move happens within subseconds in
> these environment, the failover happens within a few seconds and even
> notice Any delay. This optimization is an undesired optimization in certain
> environment. If there is no param already exists for a behavior change,
> how I can make it work within this environment? I wonder if it requires a
> code change? If so, I could look in to it if someone can help with some
> pointers.
>
> Regards
>
> Aboo
>
> On Wed, Nov 6, 2024 at 11:05 AM Horn, Chris <chris.horn at hpe.com> wrote:
>
>> Here the failover is designed in such a way that the IP address moves
>> (fails over) with OST and becomes active on the other server.
>>
>>
>>
>> This is probably the source of your problem. I would suggest assigning
>> unique IP addresses to each OSS.
>>
>>
>>
>> Chris Horn
>>
>>
>>
>> *From: *lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on
>> behalf of Backer <backer.kolo at gmail.com>
>> *Date: *Tuesday, November 5, 2024 at 10:19 PM
>> *To: *Backer via lustre-discuss <lustre-discuss at lists.lustre.org>,
>> lustre-devel at lists.lustre.org <lustre-devel at lists.lustre.org>
>> *Subject: *Re: [lustre-discuss] Lustre switching to loop back lnet
>> interface when it is not desired
>>
>> Any ideas on how to avoid using 0 at lo as failover_nids? Please see below.
>>
>>
>>
>> On Tue, 5 Nov 2024 at 12:34, Backer <backer.kolo at gmail.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Mounting the Lustre file file system on the OSS. Some of the OSTs are
>> locally attached to the OSS.
>>
>> The failover IP on the OST is "10.99.100.152". It is a local lnet on the
>> OSS. However, when the client mounts it, the import automatically changes
>> to 0 at lo. It is undesirable here because when this OST fails over to
>> another server, the client is still trying to connect to 0 at lo while it
>> is no longer on the same host. This makes the client fs mount hangs for
>> ever.
>>
>> Here the failover is designed in such a way that the IP address moves
>> (fails over) with OST and becomes active on the other server.
>>
>> How can I make the import pointing to the real IP and not the loopback?
>> (so that the failover works)
>>
>>
>>
>>
>>
>> [oss000 ~]$ lfs df
>> UUID 1K-blocks Used Available Use% Mounted on
>> fs-MDT0000_UUID 29068444 25692 26422344 1% /mnt/fs[MDT:0]
>> fs-OST0000_UUID 50541812 30160292 17743696 63% /mnt/fs[OST:0]
>> fs-OST0001_UUID 50541812 29301740 18602248 62% /mnt/fs[OST:1]
>> fs-OST0002_UUID 50541812 29356508 18547480 62% /mnt/fs[OST:2]
>> fs-OST0003_UUID 50541812 8822980 39081008 19% /mnt/fs[OST:3]
>>
>> filesystem_summary: 202167248 97641520 93974432 51% /mnt/fs
>>
>> [oss000 ~]$ df -h
>> Filesystem Size Used Avail Use% Mounted on
>> devtmpfs 30G 0 30G 0% /dev
>> tmpfs 30G 8.1M 30G 1% /dev/shm
>> tmpfs 30G 25M 30G 1% /run
>> tmpfs 30G 0 30G 0% /sys/fs/cgroup
>> /dev/mapper/ocivolume-root 36G 17G 19G 48% /
>> /dev/sdc2 1014M 637M 378M 63% /boot
>> /dev/mapper/ocivolume-oled 10G 2.5G 7.6G 25% /var/oled
>> /dev/sdc1 100M 5.1M 95M 6% /boot/efi
>> tmpfs 5.9G 0 5.9G 0% /run/user/987
>> tmpfs 5.9G 0 5.9G 0% /run/user/0
>> /dev/sdb 49G 28G 18G 62% /fs-OST0001
>> /dev/sda 49G 29G 17G 63% /fs-OST0000
>> tmpfs 5.9G 0 5.9G 0% /run/user/1000
>> 10.99.100.221 at tcp1:/fs 193G 94G 90G 51% /mnt/fs
>>
>> [oss000 ~]$ sudo tunefs.lustre --dryrun /dev/sda
>> checking for existing Lustre data: found
>>
>> Read previous values:
>> Target: fs-OST0000
>> Index: 0
>> Lustre FS: fs
>> Mount type: ldiskfs
>> Flags: 0x1002
>> (OST no_primnode )
>> Persistent mount opts: ,errors=remount-ro
>> Parameters: mgsnode=10.99.100.221 at tcp1 failover.node=10.99.100.152 at tcp1
>> ,10.99.100.152 at tcp1
>>
>>
>> Permanent disk data:
>> Target: fs-OST0000
>> Index: 0
>> Lustre FS: fs
>> Mount type: ldiskfs
>> Flags: 0x1002
>> (OST no_primnode )
>> Persistent mount opts: ,errors=remount-ro
>> Parameters: mgsnode=10.99.100.221 at tcp1 failover.node=10.99.100.152 at tcp1
>> ,10.99.100.152 at tcp1
>>
>> exiting before disk write.
>>
>>
>> [oss000 proc]# cat
>> /proc/fs/lustre/osc/fs-OST0000-osc-ffff89c57672e000/import
>> import:
>> name: fs-OST0000-osc-ffff89c57672e000
>> target: fs-OST0000_UUID
>> state: IDLE
>> connect_flags: [ write_grant, server_lock, version, request_portal,
>> max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize,
>> alt_checksum_algorithm, fid_is_enabled, version_recovery, grant_shrink,
>> full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress,
>> grant_param, lvb_type, short_io, lfsck, bulk_mbits, second_flags,
>> lockaheadv2, increasing_xid, client_encryption, lseek, reply_mbits ]
>> connect_data:
>> flags: 0xa0425af2e3440078
>> instance: 39
>> target_version: 2.15.3.0
>> initial_grant: 8437760
>> max_brw_size: 4194304
>> grant_block_size: 4096
>> grant_inode_size: 32
>> grant_max_extent_size: 67108864
>> grant_extent_tax: 24576
>> cksum_types: 0xf7
>> max_object_bytes: 17592186040320
>> import_flags: [ replayable, pingable, connect_tried ]
>> connection:
>> failover_nids: [ 0 at lo, 0 at lo ]
>> current_connection: 0 at lo
>> connection_attempts: 1
>> generation: 1
>> in-progress_invalidations: 0
>> idle: 36 sec
>> rpcs:
>> inflight: 0
>> unregistering: 0
>> timeouts: 0
>> avg_waittime: 2627 usec
>> service_estimates:
>> services: 1 sec
>> network: 1 sec
>> transactions:
>> last_replay: 0
>> peer_committed: 0
>> last_checked: 0
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20241107/9530a118/attachment.htm>
More information about the lustre-devel
mailing list