[lustre-discuss] problem after upgrading 2.10.4 to 2.12.4

Hebenstreit, Michael michael.hebenstreit at intel.com
Tue Jun 23 13:03:39 PDT 2020


Another tidbit: the 2 OST nodes showing problems have an lsfsck running and I cannot stop it

[root at elfsa2o1 ~]# grep status /proc/fs/lustre/osd-zfs/lfsarc02-OST*/oi_scrub
/proc/fs/lustre/osd-zfs/lfsarc02-OST0000/oi_scrub:status: completed
/proc/fs/lustre/osd-zfs/lfsarc02-OST0002/oi_scrub:status: scanning
/proc/fs/lustre/osd-zfs/lfsarc02-OST0004/oi_scrub:status: scanning
/proc/fs/lustre/osd-zfs/lfsarc02-OST0006/oi_scrub:status: scanning
/proc/fs/lustre/osd-zfs/lfsarc02-OST0008/oi_scrub:status: scanning
/proc/fs/lustre/osd-zfs/lfsarc02-OST000a/oi_scrub:status: scanning

An lfsck on the MDT hangs, as orphaned inodes cannot be deleted

[ 9568.345851] LustreError: 6592:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) lfsarc02-OST0006-osc-MDT0000: cannot cleanup orphans: rc = -22
[ 9568.364339] LustreError: 6592:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) Skipped 6590 previous similar messages

Is there any way to stop the scans on the OSTs?

From: Hebenstreit, Michael
Sent: Tuesday, June 23, 2020 11:19
To: lustre-discuss at lists.lustre.org
Subject: problem after upgrading 2.10.4 to 2.12.4

We experienced on our Archive Lustre (ZFS based, 4 OST servers with 6 OSTs pools each) the very same issues as described here:

https://jira.whamcloud.com/browse/LU-13392

Certain directories cannot be accessed, and the OSTs shows thousands of errors "Can't find FID Sequence". Unfortunately I cannot even start the recommended file system checking on the OST devices  - example:

[root at elfsa2o1 ~]# lctl lfsck_start -o -M lfsarc02-OST0002
Fail to start LFSCK: Operation not permitted
[root at elfsa2o1 ~]# lctl lfsck_start -M lfsarc02-OST0002
Fail to start LFSCK: Operation not supported

On a similar system that was first installed as 2.10.4, then upgraded to 2.10.8, and now is also running on 2.12.4, at least the second command starts:
# lctl lfsck_start -M lfsarc01-OST0002

The commands are issued on the system with the actual ZFS pools running.

Questions:
Is there any way to force the file system checks?
Has anyone found a workaround for the FID sequence errors?
Can I downgrade from 2.12.4 to 2.10.8 without destroying the FS?
Has the error described in https://jira.whamcloud.com/browse/LU-13392 been fixed in 2.12.5<https://jira.whamcloud.com/browse/LU-13392%20been%20fixed%20in%202.12.5>?

Thanks
Michael


------------------------------------------------------------------------
Michael Hebenstreit                 Senior Cluster Architect
Intel Corporation, MS: RR1-105/H14  TSACG
1600 Rio Rancho Blvd SE             Tel.:   +1 505-794-3144
Rio Rancho, NM 87124
UNITED STATES                       E-mail: michael.hebenstreit at intel.com<mailto:michael.hebenstreit at intel.com>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200623/6248eef1/attachment-0001.html>


More information about the lustre-discuss mailing list