[lustre-discuss] recovery MDT ".." directory entries (LU-5626)
chris.hunter at yale.edu
Wed Nov 4 07:24:07 PST 2015
On 11/02/2015 12:30 PM, Martin Hecht wrote:
> Hi Chris and Patrick,
> I was sick last week so I have found this conversation not before today,
> On 10/27/2015 05:06 PM, Patrick Farrell wrote:
>> If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance.
> there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs
> this would be the right choice.
>> Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location.
>> I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem:
>> When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.)
> If I recall correctly, moving (or renaming) the corrupted directory to
> another place caused the MDT to go readonly, probably adding more files
> as Patrick wrote before is another trigger.
> In our case we captured the full ouptut of e2fsck which contained the
> original names and the inodes. fsck moved some of the files and
> subdiretories of the corrupted directories to lost+found. With the
> information contained in the e2fsck output we could move them back from
> lost+found to their original place on the ldiskfs level (I have parsed
> the e2fsck output for a pattern matching the inode numbers and created a
> script out of it). We had to repeat this a couple of times, because
> either some of the subdirectories moved to lost+found were in a bad
> shape themselves or were further damaged later when the owners added
> files to them later on or moved them around.
> So, if you have captured all your e2fsck output and you haven't yet
> cleaned up lost+found, you still can recover the data. lfsck would
> probably throw away the objects on the OSTs because it thinks they are
> orphane objects left over after deleting the files.
> best regards,
Yes I believe you want to (manually) recover the directories from
lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't
think lfsck on the MDT will impact orphan objects on the OSTs.
More information about the lustre-discuss