[lustre-discuss] recovery MDT ".." directory entries (LU-5626)
paf at cray.com
Wed Nov 4 04:34:22 PST 2015
Our observation at the time was that lfsck did not add the fid to the .. dentry unless there was already space in the appropriate location. I don't remember digging in to the details, but that was our observation at the time. (Since it meant lfsck namespace was behaving, in a sense, correctly, we were initially puzzled but decided it was all right. I seem to remember reading a comment somewhere that the developers decided rearranging the dentries was too hard, so they'd only add fids were space was already present.)
It's possible we didn't get that quite right, though it would have to be partial somehow - misplaced .. dentries with fids were definitely not universal after running the namespace lfsck. (Drawing on experience from other sites here as well.)
In any case, directories with bad .. dentries can be identified with fsck anyway.
From: Martin Hecht [hecht at hlrs.de]
Sent: Wednesday, November 04, 2015 3:42 AM
To: Patrick Farrell; Mohr Jr, Richard Frank (Rick Mohr)
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 11/04/2015 03:23 AM, Patrick Farrell wrote:
> PAF: Remember, the specific conditions are pretty tight. Created under 1.8, not empty (if it's empty, the .. dentry is not misplaced when moved) but also non-htree, then moved with dirdata enabled, and then grown to this larger size. How many existing (small) directories do you move and then add a bunch of files to? It's a pretty rare operation. We only hit it at Martin's site because of an automated tool they have to re-arrange user/job directories.
Well, not only because of the tool. Especially, because when the
directories have been moved by the tool, no files are added anymore.
However, our mechanism gives a reason to the users to move their data
from time to time (that's not the intention of the mechanism, but that's
how some users react).
But I'm not quite sure anymore if moving the directories is really a
precondition to run into LU-5626.
We have run the background lfsck which adds the FID to the existing
dentries. This might be an important detail, because in our case a
second '..' entry containing the FID was presumably created by lfsck (in
the wrong place), and not by moving the directory. To my current
understanding the user then only has to add some files to trigger the LBUG.
A subsequent e2fsck will not only find this particular directory but all
other small directories with a '..' entry in the wrong place. When
e2fsck tries to fix these directories, some entries are overwritten by
the FID and these files are then moved to lost+found.
If one of these first entries happens to be a small subdirectory, I
believe there is a chance to run into the same issue again, when you
move everything back to the original location after the e2fsck and
someone starts adding files in these subdirectories.
However, the preconditions are still quite narrow: small directories,
not empty, created without fid, then converted by lfsck (or
alternatively moved to a different place which would also create the
second '..' entry). To trigger the LBUG files need to be added to one of
these directories and for a second occurrence of the LBUG the same
conditions must hold for another subdirectory which must have been at
the very beginning of the directory.
More information about the lustre-discuss