[Lustre-discuss] OST crash with group descriptors

Mag Gam magawake at gmail.com
Thu Mar 12 20:15:47 PDT 2009


Nice tip.

This should go into the Knowledge Base -- if one exists  ;-)

On Thu, Mar 12, 2009 at 11:03 PM, thhsieh
<thhsieh at piano.rcas.sinica.edu.tw> wrote:
> Hello,
>
> >From our test, it really gets 5/6 of FILES. So actually this
> technique is quite useful for emergent recovering.
>
> There is another tip I can share here. After following Andreas's
> suggestions, we finally got back all the OSTs. But still there
> are a lot of files cannot be recovered. If you use "ls -l" command,
> you can very easily to identify such kind of files:
>
> -rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV27
> -rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV28
> ?--------- ? ?       ?               ?                ? EIV29
> -rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV30
> -rw-r--r-- 1 thhsieh thhsieh     19488 2008-09-18 16:04 fort.8
>
> where "EIV29" is the corrupted file.
>
> I suspect that these files may be attached to the "lost+found"
> directory of the OSTs. Hence the first step I have to identify
> which OST it locates. Using this command:
>
>        lfs getstripe EIV29
>
> it gives:
>
> OBDS:
> 0: cwork2-OST0000_UUID ACTIVE
> 1: cwork2-OST0001_UUID ACTIVE
> 2: cwork2-OST0002_UUID ACTIVE
> 3: cwork2-OST0003_UUID ACTIVE
> 4: cwork2-OST0004_UUID ACTIVE
> 5: cwork2-OST0005_UUID ACTIVE
> EIV29
>        obdidx           objid          objid            group
>             3          118557        0x1cf1d                0
>
> which means that the file should be in cwork2-OST0003. Next I find the
> location of cwork2-OST0003 in our cluster. There are several ways to
> do that. A standard way is describe in the UsersGuide:
>
> cat /proc/fs/lustre/osc/cwork2-OST0003*/cst_conn_uuid
>
> If in case that node has several OSTs, then you can use the command
> to identify them:
>
> dumpe2fs /dev/sda1 | head
>
> so you can see the OST name in the first line: "Filesystem volume name:".
>
> Now, we have to shutdown the lustre filesystem completely (umount clients,
> OSTs), and remount the OST we want to check with ldiskfs:
>
> mount -t ldiskfs /dev/sda1 /mnt
>
> Then in /mnt/lost+found/, you may see a lot of losted files there.
> But still difficult to identify which one is which.
>
> If we can know the features of the original file, e.g., its creating or
> last modifying time, its roughly size, its owner, or its type, then its
> is still possible to pick up the correct one. For example, yesterday
> I tried to correctly pick up the "Zip archived" file from thousands of
> files, by picking out the files belong to the owner, and use the
>
>        file <filename>
>
> to check its original format. Very fortunately there is only one "Zip"
> format file, so that is it.
>
> Since this technique is very tedious, but still cannot guarantee to
> recover files, it is only useful to recover a few files which may be
> the most critical.  However, if you do have very important file which
> can not be losted, then this way may be worth to try.
>
> Cheers,
>
> T.H.Hsieh
>
>
> On Thu, Mar 12, 2009 at 09:08:53PM -0400, Mag Gam wrote:
>> This was a very interesting thread to read. I too have been in the
>> same situation and it really stunk! I just went ahead and restored the
>> filesystem 10T :-(
>>
>> Seeing Andreas at work is  art :-)
>>
>> I have a question about this:
>>
>> Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless!
>> However, 5/6 of Files is amazing.  I was under the impression the file
>> would even be striped across (even if you don't enable striping).
>>
>> TIA
>>
>>
>>
>>
>> On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote:
>> > Hi T.H.,
>> >
>> > I do not envy your situation.   I have been in a very similar
>> > scenario.   Andreas Dilger gave me some very good information on
>> > deactivating the bad OST and then copying the remaining good files.
>> > It worked for me.
>> >
>> > The thread is archived in cyber-space under:
>> > http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html
>> >
>> > Good Luck,
>> > megan
>> > _______________________________________________
>> > Lustre-discuss mailing list
>> > Lustre-discuss at lists.lustre.org
>> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> >
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list