[lustre-discuss] [Urgent] Multiple issues in Lustre 2.12.2.

Tue Aug 27 00:35:30 PDT 2019

Hello Team,
I am facing multiple issues when I configure Lustre in clustered
environment with multiple OST in HA-LVM and one MGS and MDT server each.

Issues:
1. OST00* are going to INACTIVE state if the corresponding disk in
unmounted from the OST server,
it never becomes active even-though if it mounted again. Had to format
every time with different --index to
get the volumes listed again. Hence, so many inactive nodes in the ‘lctl
dl’ output.
2. Recovery status is always inactive.
3. Re-activation the INACTIVE OSTs at runtime?

************************************************
Below are topology, Logs and other details:
------------------------------------------------

[HA1]< -------- [N3]-------- >[N4]
|                |              |
|                |              |
-------------[Client]------------

[N1,N2] = HA1  --> OSTs
 N3 --- > MGS
 N4 --- > MDT

N1 -> 3 Logical volumes [OST1,OST2,OST3]
N2 -> 3 Logical volumes [OST4,OST5,OST6]
N3 -> 1 Logical volume  [MGT1]
N4 -> 1 Logical volume  [MDT1]

------------------------------------------
N3 [MGS]

Created Zpool, formatted and Mounted it.

zpool create -f -O canmount=off -o multihost=on -o cachefile=none lustre
 /dev/mgs/mgs01
mkfs.lustre --reformat --mgs --backfstype=zfs lustre/mgs01
mount.lustre lustre/mgs01 /mnt/mgs/

------------------------------------------

N4 [MDT]

Created Zpool, formatted and Mounted it.
zpool create -f -O canmount=off -o multihost=on -o cachefile=none lustre
/dev/mdt/mdt01
mkfs.lustre  --reformat --mdt --fsname=lustre --index=0
--mgsnode=10.2.2.202 at tcp1 --backfstype=zfs lustre/mdt01
mount.lustre lustre/mdt01 /mnt/mdt

----------------------------------------

HA1 [HA-LVM system]
N1 [OST1,OST2,OST3]

Created Zpool, formatted and Mounted it.
zpool create lustre -f -O canmount=off -o multihost=on -o cachefile=none
/dev/vg_e/thinvolume1 /dev/vg_e/thinvolume2 /dev/vg_e/thinvolume3
mkfs.lustre  --reformat --ost --backfstype=zfs --fsname=lustre --index=111
--mgsnode=10.2.2.202 at tcp1 --servicenode=10.2.2.239 at tcp1:10.2.2.241 at tcp1
lustre/ost01  ; mount.lustre lustre/ost01 /mnt/ost01/

mkfs.lustre  --reformat --ost --backfstype=zfs --fsname=lustre --index=222
--mgsnode=10.2.2.202 at tcp1 --servicenode=10.2.2.239 at tcp1:10.2.2.241 at tcp1
lustre/ost02  ; mount.lustre lustre/ost02 /mnt/ost02/
mkfs.lustre  --reformat --ost --backfstype=zfs --fsname=lustre --index=333
--mgsnode=10.2.2.202 at tcp1 --servicenode=10.2.2.239 at tcp1:10.2.2.241 at tcp1
lustre/ost03  ; mount.lustre lustre/ost03 /mnt/ost03/
df -h | grep lustre
lustre/ost01    287G  3.0M  287G   1% /mnt/ost01
lustre/ost02    287G  3.0M  287G   1% /mnt/ost02
lustre/ost03    287G  3.0M  287G   1% /mnt/ost03

N2 [OST4,OST5,OST6]

Created Zpool, formatted and Mounted it.
zpool create -f -O canmount=off -o multihost=on -o cachefile=none lustre
 /dev/vg_p/thinvolume1 /dev/vg_p/thinvolume2 /dev/vg_p/thinvolume3
mkfs.lustre  --reformat --ost --backfstype=zfs --fsname=lustre --index=444
--mgsnode=10.2.2.202 at tcp1 --servicenode=10.2.2.239 at tcp1:10.2.2.241 at tcp1
lustre/ost04 ; mount.lustre lustre/ost04 /mnt/ost04

mkfs.lustre  --reformat --ost --backfstype=zfs --fsname=lustre --index=555
--mgsnode=10.2.2.202 at tcp1 --servicenode=10.2.2.239 at tcp1:10.2.2.241 at tcp1
lustre/ost05 ; mount.lustre lustre/ost05 /mnt/ost05
mkfs.lustre  --reformat --ost --backfstype=zfs --fsname=lustre --index=666
--mgsnode=10.2.2.202 at tcp1 --servicenode=10.2.2.239 at tcp1:10.2.2.241 at tcp1
lustre/ost06 ; mount.lustre lustre/ost06 /mnt/ost06
df -h | grep lustre
lustre/ost04    287G  3.0M  287G   1% /mnt/ost04
lustre/ost05    287G  3.0M  287G   1% /mnt/ost05
lustre/ost06    287G  3.0M  287G   1% /mnt/ost06

Created PCS cluster over HA1.
Resource Group: electron
     vg_e       (ocf::heartbeat:LVM):   Started gp-electron
     zfs-pool-electron  (ocf::heartbeat:ZFS):   Started electron
     lustre-ost1        (ocf::heartbeat:Lustre):        Started electron
     lustre-ost2        (ocf::heartbeat:Lustre):        Started electron
     lustre-ost3        (ocf::heartbeat:Lustre):        Started electron

 Resource Group: proton
     vg_p       (ocf::heartbeat:LVM):   Started gp-proton
     zfs-pool-proton    (ocf::heartbeat:ZFS):   Started proton
     lustre-ost4        (ocf::heartbeat:Lustre):        Started proton
     lustre-ost5        (ocf::heartbeat:Lustre):        Started proton
     lustre-ost6        (ocf::heartbeat:Lustre):        Started proton

----------------------------------------

Client:
# mount | grep lustre
10.2.2.202 at tcp1:/lustre on /lustre type lustre (rw,lazystatfs)

#lfs osts
OBDS:
1: lustre-OST0001_UUID INACTIVE
2: lustre-OST0002_UUID INACTIVE
3: lustre-OST0003_UUID INACTIVE
4: lustre-OST0004_UUID INACTIVE
5: lustre-OST0005_UUID INACTIVE
6: lustre-OST0006_UUID INACTIVE
10: lustre-OST000a_UUID INACTIVE
11: lustre-OST000b_UUID INACTIVE
20: lustre-OST0014_UUID INACTIVE
22: lustre-OST0016_UUID INACTIVE
30: lustre-OST001e_UUID INACTIVE
33: lustre-OST0021_UUID INACTIVE
40: lustre-OST0028_UUID INACTIVE
44: lustre-OST002c_UUID INACTIVE
50: lustre-OST0032_UUID INACTIVE
55: lustre-OST0037_UUID INACTIVE
60: lustre-OST003c_UUID INACTIVE
66: lustre-OST0042_UUID INACTIVE
111: lustre-OST006f_UUID ACTIVE
222: lustre-OST00de_UUID ACTIVE
333: lustre-OST014d_UUID ACTIVE
444: lustre-OST01bc_UUID ACTIVE
555: lustre-OST022b_UUID ACTIVE
666: lustre-OST029a_UUID ACTIVE

# lfs mdts
MDTS:
0: lustre-MDT0000_UUID ACTIVE

------------------------------------------------------

Old case:
Lustre in HA-LVM Cluster issue
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-August/016650.html

******************************************************
Regards,
-Udai Sharma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190827/1e309230/attachment.html>