[Lustre-discuss] OST crash with group descriptors

thhsieh thhsieh at piano.rcas.sinica.edu.tw
Thu Mar 12 20:03:10 PDT 2009


Hello,

>From our test, it really gets 5/6 of FILES. So actually this
technique is quite useful for emergent recovering.

There is another tip I can share here. After following Andreas's
suggestions, we finally got back all the OSTs. But still there
are a lot of files cannot be recovered. If you use "ls -l" command,
you can very easily to identify such kind of files:

-rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV27
-rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV28
?--------- ? ?       ?               ?                ? EIV29
-rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV30
-rw-r--r-- 1 thhsieh thhsieh     19488 2008-09-18 16:04 fort.8

where "EIV29" is the corrupted file.

I suspect that these files may be attached to the "lost+found"
directory of the OSTs. Hence the first step I have to identify
which OST it locates. Using this command:

	lfs getstripe EIV29

it gives:

OBDS:
0: cwork2-OST0000_UUID ACTIVE
1: cwork2-OST0001_UUID ACTIVE
2: cwork2-OST0002_UUID ACTIVE
3: cwork2-OST0003_UUID ACTIVE
4: cwork2-OST0004_UUID ACTIVE
5: cwork2-OST0005_UUID ACTIVE
EIV29
        obdidx           objid          objid            group
             3          118557        0x1cf1d                0

which means that the file should be in cwork2-OST0003. Next I find the
location of cwork2-OST0003 in our cluster. There are several ways to
do that. A standard way is describe in the UsersGuide:

cat /proc/fs/lustre/osc/cwork2-OST0003*/cst_conn_uuid

If in case that node has several OSTs, then you can use the command
to identify them:

dumpe2fs /dev/sda1 | head

so you can see the OST name in the first line: "Filesystem volume name:".

Now, we have to shutdown the lustre filesystem completely (umount clients,
OSTs), and remount the OST we want to check with ldiskfs:

mount -t ldiskfs /dev/sda1 /mnt

Then in /mnt/lost+found/, you may see a lot of losted files there.
But still difficult to identify which one is which.

If we can know the features of the original file, e.g., its creating or
last modifying time, its roughly size, its owner, or its type, then its
is still possible to pick up the correct one. For example, yesterday
I tried to correctly pick up the "Zip archived" file from thousands of
files, by picking out the files belong to the owner, and use the

	file <filename>

to check its original format. Very fortunately there is only one "Zip"
format file, so that is it.

Since this technique is very tedious, but still cannot guarantee to
recover files, it is only useful to recover a few files which may be
the most critical.  However, if you do have very important file which
can not be losted, then this way may be worth to try.

Cheers,

T.H.Hsieh


On Thu, Mar 12, 2009 at 09:08:53PM -0400, Mag Gam wrote:
> This was a very interesting thread to read. I too have been in the
> same situation and it really stunk! I just went ahead and restored the
> filesystem 10T :-(
> 
> Seeing Andreas at work is  art :-)
> 
> I have a question about this:
> 
> Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless!
> However, 5/6 of Files is amazing.  I was under the impression the file
> would even be striped across (even if you don't enable striping).
> 
> TIA
> 
> 
> 
> 
> On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote:
> > Hi T.H.,
> >
> > I do not envy your situation.   I have been in a very similar
> > scenario.   Andreas Dilger gave me some very good information on
> > deactivating the bad OST and then copying the remaining good files.
> > It worked for me.
> >
> > The thread is archived in cyber-space under:
> > http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html
> >
> > Good Luck,
> > megan
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list