[lustre-discuss] Odd behavior with tunefs.lustre and device index

Sun Jan 21 15:34:58 PST 2024

Just to clarify. OSS-2 is completely powered off (hard power off without
any graceful shutdown) before start working on OSS-3.

On Sun, 21 Jan 2024 at 12:12, Backer <backer.kolo at gmail.com> wrote:

> Hi All,
>
> I am seeing a behavior with tunefs.lustre. After changing the failover
> node and trying to mount an OST, getting getting the following error:
>
> The target service's index is already in use. (/dev/sdd)
>
>
> After the above error, and performing --writeconf once, I can repeat these
> steps (see below) any number of times and any OSS without --writeconf.
>
>
> This is an effort to mount an OST to a new OSS. I reproduced this issue
> after simplifying some steps and reproducing the behavior (see below)
> consistently. I was wondering if anyone could help me to understand this?
>
> [root at OSS-2 opc]# lctl list_nids
>
> 10.99.101.18 at tcp1
>
> [root at OSS-2 opc]#
>
>
> [root at OSS-2 opc]# mkfs.lustre --reformat  --ost --fsname="testfs"
> --index="64"  --mgsnode "10.99.101.6 at tcp1" --mgsnode "10.99.101.7 at tcp1"
> --servicenode "10.99.101.18 at tcp1" "/dev/sdd"
>
>
>    Permanent disk data:
>
> Target:     testfs:OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
> device size = 51200MB
>
> formatting backing filesystem ldiskfs on /dev/sdd
>
> target name   testfs:OST0040
>
> kilobytes     52428800
>
> options        -J size=1024 -I 512 -i 69905 -q -O
> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
>
> mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040  -J size=1024 -I 512 -i
> 69905 -q -O
> extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg
> -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
> /dev/sdd 52428800k
>
> Writing CONFIGS/mountdata
>
>
> [root at OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>
> checking for existing Lustre data: found
>
>
>    Read previous values:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
>
>    Permanent disk data:
>
> Target:     testfs:OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
> exiting before disk write.
>
> [root at OSS-2 opc]#
>
>
> [root at OSS-2 opc]# tunefs.lustre --erase-param failover.node --servicenode
> 10.99.101.18 at tcp1 /dev/sdd
>
> checking for existing Lustre data: found
>
>
>    Read previous values:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
>
>    Permanent disk data:
>
> Target:     testfs:OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
> Writing CONFIGS/mountdata
>
>
> [root at OSS-2 opc]# mkdir /testfs-OST0040
>
> [root at OSS-2 opc]# mount -t lustre /dev/sdd  /testfs-OST0040
>
> mount.lustre: increased
> '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb'
> from 1024 to 16384
>
> [root at OSS-2 opc]#
>
>
> [root at OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd
>
> checking for existing Lustre data: found
>
>
>    Read previous values:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1002
>
>               (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
>
>    Permanent disk data:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1002
>
>               (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
> exiting before disk write.
>
> [root at OSS-2 opc]#
>
>
>
>
> Going over to OSS-3 and trying to mount OST.
>
>
>
> [root at OSS-3 opc]# lctl list_nids
>
> 10.99.101.19 at tcp1
>
> [root at OSS-3 opc]#
>
>
> Parameters looks same as OSS-2
>
>
> [root at OSS-3 opc]#  tunefs.lustre --dryrun /dev/sdd
>
> checking for existing Lustre data: found
>
>
>    Read previous values:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1002
>
>               (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
>
>    Permanent disk data:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1002
>
>               (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
> exiting before disk write.
>
> [root at OSS-3 opc]#
>
>
> Changing failover node to current node.
>
>
> [root at OSS-3 opc]# tunefs.lustre --erase-param failover.node --servicenode
> 10.99.101.19 at tcp1 /dev/sdd
>
> checking for existing Lustre data: found
>
>
>    Read previous values:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1002
>
>               (OST no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.18 at tcp1
>
>
>
>    Permanent disk data:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1042
>
>               (OST update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.19 at tcp1
>
>
>
> <waits here for MPP time out (multi mount protection>
>
> After it completes the write, for some reason this OST is being marked as
> 'first_time' flag 0x1062 in next command.
>
> [root at OSS-3 opc]#  tunefs.lustre --dryrun /dev/sdd
>
> checking for existing Lustre data: found
>
>
>    Read previous values:
>
> Target:     testfs-OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.19 at tcp1
>
>
>
>    Permanent disk data:
>
> Target:     testfs:OST0040
>
> Index:      64
>
> Lustre FS:  testfs
>
> Mount type: ldiskfs
>
> Flags:      0x1062
>
>               (OST first_time update no_primnode )
>
> Persistent mount opts: ,errors=remount-ro
>
> Parameters:  mgsnode=10.99.101.6 at tcp1:10.99.101.7 at tcp1
> failover.node=10.99.101.19 at tcp1
>
>
> exiting before disk write.
>
> [root at OSS-3 opc]#
>
>
>
>
> Mount doesn't work here because it is marked as first time and this OST is
> not first time as it was already mounted using OST-2 OSS, and MGS knows
> about it.
>
> [root at OSS-3 opc]#  mkdir /testfs-OST0040
>
> [root at OSS-3 opc]# mount -t lustre /dev/sdd  /testfs-OST0040
>
> mount.lustre: mount /dev/sdd at /testfs-OST0040 failed: Address already in
> use
>
> The target service's index is already in use. (/dev/sdd)
>
> [root at OSS-3 opc]#
>
>
> From here, if I do tunefs.lustre with --writeconf, it works. Once this is
> done, repeating the above experiment any number of times on any servers
> works fine as expected without using --writeconf. (FYI Note: --writeconfig
> is mentioned as a dangerous command)
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240121/f0e7ec7f/attachment-0001.htm>