[Lustre-discuss] OST crash with group descriptors

Andreas Dilger adilger at sun.com
Fri Mar 13 05:23:04 PDT 2009


On Mar 13, 2009  11:03 +0800, thhsieh wrote:
> There is another tip I can share here. After following Andreas's
> suggestions, we finally got back all the OSTs. But still there
> are a lot of files cannot be recovered. If you use "ls -l" command,
> you can very easily to identify such kind of files:
> 
> -rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV27
> -rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV28
> ?--------- ? ?       ?               ?                ? EIV29
> -rw-r--r-- 1 thhsieh thhsieh  61440008 2007-05-21 18:49 EIV30
> -rw-r--r-- 1 thhsieh thhsieh     19488 2008-09-18 16:04 fort.8
> 
> where "EIV29" is the corrupted file.

Right, because "ls -l" got an error when reading the size for
this file.

> Then in /mnt/lost+found/, you may see a lot of losted files there.
> But still difficult to identify which one is which.
> 
> If we can know the features of the original file, e.g., its creating or
> last modifying time, its roughly size, its owner, or its type, then its
> is still possible to pick up the correct one. For example, yesterday
> I tried to correctly pick up the "Zip archived" file from thousands of
> files, by picking out the files belong to the owner, and use the
> 
> 	file <filename>
> 
> to check its original format. Very fortunately there is only one "Zip"
> format file, so that is it.
> 
> Since this technique is very tedious, but still cannot guarantee to
> recover files, it is only useful to recover a few files which may be
> the most critical.  However, if you do have very important file which
> can not be losted, then this way may be worth to try.

There is a tool specifically for this, which I mentioned in my earlier
email "ll_recover_lost_found_objs", which will run against the ldiskfs
mounted filesystem:

Usage: ./lustre/utils/ll_recover_lost_found_objs [-hv] -d lost+found_directory
You need to mount the corrupted OST filesystem andprovide the path for the
lost+found directory as the -d option, for example:
ll_recover_lost_found_objs -d /mnt/ost/lost+found


This will move all (or at least most) of the objects from lost+found
back to their place in the O/0/d* directories, and you will have most
of your files back.


The first time Lustre writes to an object it saves the MDS inode number
and the objid as an extended attribute on the object, so that in the
case of a directory corruption on the OST it is possible to recover,
as you need to do.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list