[lustre-discuss] Attempting to recover zfs ost after file corruption

Mohr, Rick mohrrf at ornl.gov
Wed Mar 3 14:03:50 PST 2021


I have a file system running Lustre 2.10.4 on CentOS 7.5 with zfs 0.7.9 that I am attempting to keep functional until we can move data to a new Lustre file system.  We recently had a couple of osts suffer from some data corruption, and after getting them imported and running a scrub, it seems the errors may be confined to two directories on the ost's underlying zfs file system: CONFIGS/ and oi.10/.

Is it possible to simply remove these files and have them automatically get rebuilt when the ost is remounted?  My hope is that any files under CONFIGS/ would get repopulated when it connected to the mgs.  But if needed, I can always extract files directly from the mgt.  The one thing that I am not sure about is how to handle the oi.10/ directory.

I reviewed the procedure in the Lustre manual for restoring an ost from a file-level backup.  Since it looks like all the user files are still intact, my thought was that I could avoid the actual file restoration step and just proceed with the steps to remove CATALOGS, oi.*, LFSCK, etc.  The main difference is that since I am not reformatting the ost, I wouldn't be able to add the "--replace" flag which sounds like it is used to trigger some of the recovery steps.

Any help is greatly appreciated.

--Rick



More information about the lustre-discuss mailing list