[lustre-discuss] recovery MDT ".." directory entries (LU-5626)
chris.hunter at yale.edu
Tue Oct 27 10:52:49 PDT 2015
Thanks for sharing your experience, looks like you did the bulk of
troubleshooting in the Jira ticket.
I assume I should have a clean filesystem (ie. run fsck first) before
disabling the dirdata feature ?
After I disable dirdata, I will need to run fsck with the "-D" option ?
FYI, ll_recover_lost_found_objs tool will recover files from lost+found
on *OST* volumes (ie. moves them back into /O/0/dXX directory) based on
extended file attributes. Section 37.5 of the HPDD manual.
chris.hunter at yale.edu
On 10/27/2015 12:06 PM, Patrick Farrell wrote:
> I had the joy of taking this one apart personally. We mostly let lfsck do the repair and moved on, accepting that some of the dentries were trashed. I think, for important things, our field staff did some manual recovery with the e2fsprogs tools, but it was not a common enough problem that we documented a procedure.
> If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance.
> Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location.
> I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem:
> When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.)
> I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 'dirdata' on your MDT until you have this under control. That will at least prevent any more directories from ending up in either of these bad states if you use the filesystem without updating Lustre to a version with the LU-5626 patch in it.
> - Patrick
> From: lustre-discuss [lustre-discuss-bounces at lists.lustre.org] on behalf of Chris Hunter [chris.hunter at yale.edu]
> Sent: Tuesday, October 27, 2015 10:22 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
> We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and
> "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with
> ".." directory entries. Are there established recovery steps for this
> issue ?
> If I run fsck, the directory entries will be moved into lost+found.
> I assume the next step is to run the ll_recover_lost_found_objs tool ?
> Can you share any advice/experience about recovery ?
> chris hunter
> chris.hunter at yale.edu
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
More information about the lustre-discuss