[lustre-discuss] Odd behavior with tunefs.lustre and device index

Backer backer.kolo at gmail.com
Thu Jan 25 10:40:08 PST 2024


Thank you Andreas.
Are you aware of any paid engagements/support for requests like these to
get changes done quickly?

On Wed, 24 Jan 2024 at 20:52, Andreas Dilger <adilger at whamcloud.com> wrote:

> This is more like a bug report and should be filed in Jira.
> That said, no guarantee that someone would be able to
> work on this in a timely manner.
>
> On Jan 24, 2024, at 09:47, Backer via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
>
> Just pushing it on to the top of inbox :)  Or is there any other
> distribution list that is more appropriate for this type of questions? I am
> also trying devel mailing list.
>
> On Sun, 21 Jan 2024 at 18:34, Backer <backer.kolo at gmail.com> wrote:
>
>> Just to clarify. OSS-2 is completely powered off (hard power off without
>> any graceful shutdown) before start working on OSS-3.
>>
>> On Sun, 21 Jan 2024 at 12:12, Backer <backer.kolo at gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am seeing a behavior with tunefs.lustre. After changing the failover
>>> node and trying to mount an OST, getting getting the following error:
>>>
>>> The target service's index is already in use. (/dev/sdd)
>>>
>>> After the above error, and performing --writeconf once, I can repeat
>>> these steps (see below) any number of times and any OSS without
>>> --writeconf.
>>>
>>> This is an effort to mount an OST to a new OSS. I reproduced this issue
>>> after simplifying some steps and reproducing the behavior (see below)
>>> consistently. I was wondering if anyone could help me to understand this?
>>>
>>> [root at OSS-2 opc]# lctl list_nids
>>> 10.99.101.18 at tcp1
>>> [root at OSS-2 opc]#
>>>
>>> [root at OSS-2 opc]# mkfs.lustre --reformat  --ost --fsname="testfs"
>>> --index="64"  --mgsnode "10.99.101.6 at tcp1" --mgsnode "10.99.101.7 at tcp1"
>>> --servicenode "10.99.101.18 at tcp1" "/dev/sdd"
>>>
>>>    Permanent disk data:
>>> Target:     testfs:OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>> device size = 51200MB
>>> formatting backing filesystem ldiskfs on /dev/sdd
>>> target name   testfs:OST0040
>>> kilobytes     52428800
>>> options        -J size=1024 -I 512 -i 69905 -q -O
>>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
>>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>>> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040  -J size=1024 -I 512 -i
>>> 69905 -q -O
>>> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
>>> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>>> /dev/sdd 52428800k
>>> Writing CONFIGS/mountdata
>>>
>>> [root at OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>    Read previous values:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>>
>>>    Permanent disk data:
>>> Target:     testfs:OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>> exiting before disk write.
>>> [root at OSS-2 opc]#
>>>
>>> [root at OSS-2 opc]# tunefs.lustre --erase-param failover.node
>>> --servicenode 10.99.101.18 at tcp1 /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>    Read previous values:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>>
>>>    Permanent disk data:
>>> Target:     testfs:OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>> Writing CONFIGS/mountdata
>>>
>>> [root at OSS-2 opc]# mkdir /testfs-OST0040
>>> [root at OSS-2 opc]# mount -t lustre /dev/sdd  /testfs-OST0040
>>> mount.lustre: increased
>>> '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb'
>>> from 1024 to 16384
>>> [root at OSS-2 opc]#
>>>
>>> [root at OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>    Read previous values:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1002
>>>               (OST no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>>
>>>    Permanent disk data:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1002
>>>               (OST no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>> exiting before disk write.
>>> [root at OSS-2 opc]#
>>>
>>>
>>>
>>> Going over to OSS-3 and trying to mount OST.
>>>
>>>
>>> [root at OSS-3 opc]# lctl list_nids
>>> 10.99.101.19 at tcp1
>>> [root at OSS-3 opc]#
>>>
>>> Parameters looks same as OSS-2
>>>
>>> [root at OSS-3 opc]#  tunefs.lustre --dryrun /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>    Read previous values:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1002
>>>               (OST no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>>
>>>    Permanent disk data:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1002
>>>               (OST no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>> exiting before disk write.
>>> [root at OSS-3 opc]#
>>>
>>> Changing failover node to current node.
>>>
>>> [root at OSS-3 opc]# tunefs.lustre --erase-param failover.node
>>> --servicenode 10.99.101.19 at tcp1 /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>    Read previous values:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1002
>>>               (OST no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.18 at tcp1
>>>
>>>
>>>    Permanent disk data:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1042
>>>               (OST update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.19 at tcp1
>>>
>>>
>>> <waits here for MPP time out (multi mount protection>
>>>
>>>
>>> After it completes the write, for some reason this OST is being marked
>>> as 'first_time' flag 0x1062 in next command.
>>>
>>> [root at OSS-3 opc]#  tunefs.lustre --dryrun /dev/sdd
>>> checking for existing Lustre data: found
>>>
>>>    Read previous values:
>>> Target:     testfs-OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.19 at tcp1
>>>
>>>
>>>    Permanent disk data:
>>> Target:     testfs:OST0040
>>> Index:      64
>>> Lustre FS:  testfs
>>> Mount type: ldiskfs
>>> Flags:      0x1062
>>>               (OST first_time update no_primnode )
>>> Persistent mount opts: ,errors=remount-ro
>>> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
>>> failover.node=10.99.101.19 at tcp1
>>>
>>> exiting before disk write.
>>> [root at OSS-3 opc]#
>>>
>>>
>>>
>>>
>>> Mount doesn't work here because it is marked as first time and this OST
>>> is not first time as it was already mounted using OST-2 OSS, and MGS knows
>>> about it.
>>>
>>> [root at OSS-3 opc]#  mkdir /testfs-OST0040
>>> [root at OSS-3 opc]# mount -t lustre /dev/sdd  /testfs-OST0040
>>> mount.lustre: mount /dev/sdd at /testfs-OST0040 failed: Address already
>>> in use
>>> The target service's index is already in use. (/dev/sdd)
>>> [root at OSS-3 opc]#
>>>
>>> From here, if I do tunefs.lustre with --writeconf, it works. Once this
>>> is done, repeating the above experiment any number of times on any servers
>>> works fine as expected without using --writeconf. (FYI Note: --writeconfig
>>> is mentioned as a dangerous command)
>>>
>>>
>>>
>>>
>>> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240125/6e91aaec/attachment-0001.htm>


More information about the lustre-discuss mailing list