[lustre-discuss] recovery MDT ".." directory entries (LU-5626)

Patrick Farrell paf at cray.com
Tue Oct 27 10:59:27 PDT 2015


That's probably best, to be safe.  By the way, this is one where (if I 
remember right) sometimes you run fsck, let it correct things, then you 
must run it again - As it will find new things to object about in the 
modified filesystem.  So if you weren't already, running fsck repeatedly 
until it doesn't complain is best.  (That's also a best practice anyway..)

I can't find a -d or -D option in my copy of fsck.  Not sure what it means?

Best of luck,
- Patrick

On 10/27/2015 12:52 PM, Chris Hunter wrote:
> Hi Patrick,
> Thanks for sharing your experience, looks like you did the bulk of 
> troubleshooting in the Jira ticket.
> I assume I should have a clean filesystem (ie. run fsck first) before 
> disabling the dirdata feature ?
> After I disable dirdata, I will need to run fsck with the "-D" option ?
> FYI, ll_recover_lost_found_objs tool will recover files from 
> lost+found on *OST* volumes (ie. moves them back into /O/0/dXX 
> directory) based on extended file attributes. Section 37.5 of the HPDD 
> manual.
> thanks
> chris hunter
> chris.hunter at yale.edu
> On 10/27/2015 12:06 PM, Patrick Farrell wrote:
>> Chris,
>> I had the joy of taking this one apart personally.  We mostly let 
>> lfsck do the repair and moved on, accepting that some of the dentries 
>> were trashed.  I think, for important things, our field staff did 
>> some manual recovery with the e2fsprogs tools, but it was not a 
>> common enough problem that we documented a procedure.
>> If you read LU-5626 carefully, there's an explanation of the exact 
>> nature of the damage, and having that should let you make partial 
>> recoveries by hand.  I'm not familiar with the 
>> ll_recover_lost_found_objs tool, but I doubt it would prove helpful 
>> in this instance.
>> Note that there's two forms to this corruption.  One is if you move a 
>> directory which was created before dirdata was enabled, then the '..' 
>> entry ends up in the wrong place.  This does not trouble Lustre, but 
>> fsck reports it as an error and will 'correct' it, which has the 
>> effect of (usually) overwriting one dentry in the directory when it 
>> creates a new '..' dentry in the correct location.
>> I don't *think* that one causes the MDT to go read only, but I could 
>> be wrong.  I *think* what causes the MDT to go read only is the other 
>> problem:
>> When you have a non-htree directory (not too many items in it, all 
>> directory entries in a single inode) that is in the bad state 
>> described above (with the '..' dentry in the wrong place after being 
>> moved) and that directory has enough files added to it that it 
>> becomes an htree directory, the resulting directory is corrupted more 
>> severely.  We never sorted out the precise details of this - I 
>> believe we chose to simply delete any directories in this state.  (I 
>> think lfsck did it for us, but can't recall for sure.)
>> I'd advise reading LU-5626 with care, and I'd also suggest you might 
>> turn off 'dirdata' on your MDT until you have this under control.  
>> That will at least prevent any more directories from ending up in 
>> either of these bad states if you use the filesystem without updating 
>> Lustre to a version with the LU-5626 patch in it.
>> - Patrick
>> ________________________________________
>> From: lustre-discuss [lustre-discuss-bounces at lists.lustre.org] on 
>> behalf of Chris Hunter [chris.hunter at yale.edu]
>> Sent: Tuesday, October 27, 2015 10:22 AM
>> To: lustre-discuss at lists.lustre.org
>> Subject: [lustre-discuss]  recovery MDT ".." directory entries (LU-5626)
>> We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and
>> "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with
>> ".." directory entries. Are there established recovery steps for this
>> issue ?
>> If I run fsck, the directory entries will be moved into lost+found.
>> I assume the next step is to run the ll_recover_lost_found_objs tool ?
>> Can you share any advice/experience about recovery ?
>> thanks,
>> chris hunter
>> chris.hunter at yale.edu
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=83OYH_ms_eqiU1wnAGo9fAzmYQX3fBG7y1eio_j_xpU&s=hl5TuadAk5fXgjermbroSP81LGazmXpj1BxqaIsP7Cw&e= 

More information about the lustre-discuss mailing list