[lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Patrick Farrell
paf at cray.com
Tue Oct 27 10:59:27 PDT 2015
Chris,
That's probably best, to be safe. By the way, this is one where (if I
remember right) sometimes you run fsck, let it correct things, then you
must run it again - As it will find new things to object about in the
modified filesystem. So if you weren't already, running fsck repeatedly
until it doesn't complain is best. (That's also a best practice anyway..)
I can't find a -d or -D option in my copy of fsck. Not sure what it means?
Best of luck,
- Patrick
On 10/27/2015 12:52 PM, Chris Hunter wrote:
> Hi Patrick,
> Thanks for sharing your experience, looks like you did the bulk of
> troubleshooting in the Jira ticket.
>
> I assume I should have a clean filesystem (ie. run fsck first) before
> disabling the dirdata feature ?
> After I disable dirdata, I will need to run fsck with the "-D" option ?
>
> FYI, ll_recover_lost_found_objs tool will recover files from
> lost+found on *OST* volumes (ie. moves them back into /O/0/dXX
> directory) based on extended file attributes. Section 37.5 of the HPDD
> manual.
>
> thanks
> chris hunter
> chris.hunter at yale.edu
>
> On 10/27/2015 12:06 PM, Patrick Farrell wrote:
>> Chris,
>>
>> I had the joy of taking this one apart personally. We mostly let
>> lfsck do the repair and moved on, accepting that some of the dentries
>> were trashed. I think, for important things, our field staff did
>> some manual recovery with the e2fsprogs tools, but it was not a
>> common enough problem that we documented a procedure.
>>
>> If you read LU-5626 carefully, there's an explanation of the exact
>> nature of the damage, and having that should let you make partial
>> recoveries by hand. I'm not familiar with the
>> ll_recover_lost_found_objs tool, but I doubt it would prove helpful
>> in this instance.
>>
>> Note that there's two forms to this corruption. One is if you move a
>> directory which was created before dirdata was enabled, then the '..'
>> entry ends up in the wrong place. This does not trouble Lustre, but
>> fsck reports it as an error and will 'correct' it, which has the
>> effect of (usually) overwriting one dentry in the directory when it
>> creates a new '..' dentry in the correct location.
>>
>> I don't *think* that one causes the MDT to go read only, but I could
>> be wrong. I *think* what causes the MDT to go read only is the other
>> problem:
>>
>> When you have a non-htree directory (not too many items in it, all
>> directory entries in a single inode) that is in the bad state
>> described above (with the '..' dentry in the wrong place after being
>> moved) and that directory has enough files added to it that it
>> becomes an htree directory, the resulting directory is corrupted more
>> severely. We never sorted out the precise details of this - I
>> believe we chose to simply delete any directories in this state. (I
>> think lfsck did it for us, but can't recall for sure.)
>>
>> I'd advise reading LU-5626 with care, and I'd also suggest you might
>> turn off 'dirdata' on your MDT until you have this under control.
>> That will at least prevent any more directories from ending up in
>> either of these bad states if you use the filesystem without updating
>> Lustre to a version with the LU-5626 patch in it.
>>
>> - Patrick
>> ________________________________________
>> From: lustre-discuss [lustre-discuss-bounces at lists.lustre.org] on
>> behalf of Chris Hunter [chris.hunter at yale.edu]
>> Sent: Tuesday, October 27, 2015 10:22 AM
>> To: lustre-discuss at lists.lustre.org
>> Subject: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
>>
>> We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and
>> "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with
>> ".." directory entries. Are there established recovery steps for this
>> issue ?
>>
>> If I run fsck, the directory entries will be moved into lost+found.
>> I assume the next step is to run the ll_recover_lost_found_objs tool ?
>>
>> Can you share any advice/experience about recovery ?
>>
>> thanks,
>> chris hunter
>> chris.hunter at yale.edu
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=83OYH_ms_eqiU1wnAGo9fAzmYQX3fBG7y1eio_j_xpU&s=hl5TuadAk5fXgjermbroSP81LGazmXpj1BxqaIsP7Cw&e=
>>
>>
More information about the lustre-discuss
mailing list