[Lustre-discuss] Completely lost MGT/MDT

Dilger, Andreas andreas.dilger at intel.com
Fri Jun 28 15:24:51 PDT 2013


On 2013/28/06 3:25 PM, "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:

>Basically, I was adding capacity to a system while doing a fresh install.
>Turns out /dev/sda which used to be the disk in the bottom slot became
>the disk in the top slot instead.
>That happened to be where the MDT was, which was promptly repartitioned
>and formatted.
>
>Not exactly something I was expecting....

Presumably you have no backups or snapshots of the MDT device?  Lustre can
handle a lot of inconsistency between the MDT and OSTs, even without
running lfsck.

Also, there was once a similar situation with a reformatted MDT that was
partly recovered using the "ext3grep" utility.  This allowed finding the
filename->inode mappings in the dirents in directory leaf blocks, and the
".." dirent allowed connecting the parent directories.  In Lustre 2.x, the
"link" xattr on the MDT inodes could also be used to recover the filenames
even if the directory entries are lost.

This won't help as much if the whole disk has been overwritten by an OS
install, but if only part of the MDT was overwritten you may be surprised
how much is recoverable with ext4.

First order is to make a copy of the whole disk before you try any further
changes (this lets you try things and restart without losing any data if
things go badly).

Repartition the disk as it was before (possibly without any partition
table at all for Lustre, or it could be dumped into an image file if not
too huge).  Then build and run the "findsuper" utility from the e2fsprogs
code (I've attached it here) and try and find any existing (old)
superblocks from before the reformat.  You can tell superblocks from the
same filesystem by the same start/end/blocks and increasing group number:

byte_offset  byte_start    byte_end  fs_blocks blksz  grp  mkfs/mount_time
          sb_uuid label
    1049600    1048576   525336576    512000  1024   0 Wed Sep 12 16:39:47
2012 8f8531a2
    9438208    1048576   525336576    512000  1024   1 Wed Sep 12 16:39:47
2012 8f8531a2
   26215424    1048576   525336576    512000  1024   3  Wed Sep 12
16:39:47 2012 8f8531a2
   42992640    1048576   525336576    512000  1024   5  Wed Sep 12
16:39:47 2012 8f8531a2
   59769856    1048576   525336576    512000  1024   7  Wed Sep 12
16:39:47 2012 8f8531a2
   76547072    1048576   525336576    512000  1024   9  Wed Sep 12
16:39:47 2012 8f8531a2
  135266304    1048576  8590983168   2097152  4096   1  Tue Jan 18
15:06:12 2011 e1e13f16 boot
  210764800    1048576   525336576    512000  1024  25  Wed Sep 12
16:39:47 2012 8f8531a2
  227542016    1048576   525336576    512000  1024  27  Wed Sep 12
16:39:47 2012 8f8531a2
  403701760    1048576  8590983168   2097152  4096   3  Tue Jan 18
15:06:12 2011 e1e13f16 boot
  412091392    1048576   525336576    512000  1024  49  Wed Sep 12
16:39:47 2012 8f8531a2
  525337600  525336576  9115271168   2097152  4096   0  Tue Jan 18
15:06:12 2011 e1e13f16 root_fc13
  659554304  525336576  9115271168   2097152  4096   1  Tue Jan 18
15:06:12 2011 e1e13f16 root_fc13
  659750912  525533184  17705402368   4194304  4096   1 Thu Jan 13
14:29:26 2011 6740a155



Then, run "e2fsck -fn -b {block} -B 4096 /dev/XXX" for one of the MDT
superblocks (which will clobber the old superblocks.  This will
potentially recover some of your old MDT filesystem into lost+found, and
you can move these into a directory called "ROOT" at the top.  Use
"getfattr" to extract the filenames from the "link" xattr.

Hope this helps.  This is one reason why I encourage everyone to make full
"dd" backups of their MDT device.  It doesn't take much space, but is
critical to the whole filesystem.

Cheers, Andreas

>> -----Original Message-----
>> From: Colin Faber [mailto:colin_faber at xyratex.com]
>> Sent: Wednesday, June 26, 2013 5:08 PM
>> To: Andrus, Brian Contractor
>> Cc: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Completely lost MGT/MDT
>> 
>> Can you describe the failure in more detail?
>> 
>> "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:
>> 
>> >All,
>> >
>> >We have a sizeable filesystem and during a hardware upgrade, our MDT
>> disk was completely lost.
>> >I am trying to find if and how to recover from such an event, but am
>>not
>> finding anything.
>> >
>> >We were running lustre 2.3 and have upgraded to 2.4 (or are in the
>>process
>> of it).
>> >
>> >Can anyone point me in the right direction here?
>> >
>> >Thanks in advance,
>>


Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division


-------------- next part --------------
A non-text attachment was scrubbed...
Name: findsuper.c
Type: application/octet-stream
Size: 8321 bytes
Desc: findsuper.c
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130628/2402c4ab/attachment.obj>


More information about the lustre-discuss mailing list