[lustre-discuss] recovery MDT ".." directory entries (LU-5626)

Martin Hecht hecht at hlrs.de
Mon Nov 2 09:30:58 PST 2015


Hi Chris and Patrick,

I was sick last week so I have found this conversation not before today,
sorry

On 10/27/2015 05:06 PM, Patrick Farrell wrote:
> If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand.  I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance.
there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs
this would be the right choice.

> Note that there's two forms to this corruption.  One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place.  This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location.
>
> I don't *think* that one causes the MDT to go read only, but I could be wrong.  I *think* what causes the MDT to go read only is the other problem:
>
> When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely.  We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state.  (I think lfsck did it for us, but can't recall for sure.)
If I recall correctly, moving (or renaming) the corrupted directory to
another place caused the MDT to go readonly, probably adding more files
as Patrick wrote before is another trigger.

In our case we captured the full ouptut of e2fsck which contained the
original names and the inodes. fsck moved some of the files and
subdiretories of the corrupted directories to lost+found. With the
information contained in the e2fsck output we could move them back from
lost+found to their original place on the ldiskfs level (I have parsed
the e2fsck output for a pattern matching the inode numbers and created a
script out of it). We had to repeat this a couple of times, because
either some of the subdirectories moved to lost+found were in a bad
shape themselves or were further damaged later when the owners added
files to them later on or moved them around.

So, if you have captured all your e2fsck output and you haven't yet
cleaned up lost+found, you still can recover the data. lfsck would
probably throw away the objects on the OSTs because it thinks they are
orphane objects left over after deleting the files. 

best regards,
Martin


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2252 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20151102/c4a6044e/attachment.bin>


More information about the lustre-discuss mailing list