[lustre-devel] Question about how online LFSCK works

Andreas Dilger adilger at whamcloud.com
Wed Nov 23 09:25:05 PST 2022

The LFSCK checking is run in a multi-stage execution. At the lowest level, the OI Scrub scans the ldiskfs or ZFS inodes from the storage and performs local sanity checks (eg. FID from xattr vs. OI FID->inode mapping) and then LFSCK consumes these objects (semi-asynchronously) to perform distributed consistency checks (eg. namespace checks for remote/striped directories, link EA checks for child->parent directory lookup, layout checks for MDT inode to OST object references, etc).  For certain types of checks, if inconsistencies are found (eg. unreferenced OST objects) they may be put into a list for supplementary checking after the scan has completed.

Since the scan is linear by the underlying filesystem inode table, it is of course possible to miss new entries added behind the current cursor, or see new entries added in front of the cursor.  In most cases these objects can be checked independently and without issue since the intersection of the scanning and changes is very rare.

To avoid impacting performance, the checking is largely done without locking. However, in the rare cases that an inconsistency is found, LFSCK will lock the object(s) involved and re-check the consistency, to avoid such transient errors.

Due to the dynamic nature of the filesystem, and the scale at which Lustre operates, we can't keep the system static during a full check. In some cases transient errors on modified objects may be deferred for a later scan.

All object updates are marked with a transaction number that is monotonically increasing (per target), so in some cases LFSCK can ignore errors that are found on objects modified since the start of the scan, and their consistency could be re-verified on a subsequent LFSCK run.

Cheers, Andreas

On Nov 23, 2022, at 06:38, Saisha Kamat via lustre-devel <lustre-devel at lists.lustre.org> wrote:

Hi all,
I am a Ph.D student at UNC-Charlotte, working on Lustre File System Checker.

I had a question related to how online LFSCK works. Specifically, how does LFSCK differentiate old metadata and new metadata created during its execution? Will the new metadata be checked? and if yes, when will they be checked by LFSCK? Can you please point me to the source code to understand more details?

Thanks very much!

lustre-devel mailing list
lustre-devel at lists.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20221123/2c0c9593/attachment.htm>

More information about the lustre-devel mailing list