[Lustre-discuss] Completely lost MGT/MDT
Dilger, Andreas
andreas.dilger at intel.com
Fri Jun 28 15:24:51 PDT 2013
On 2013/28/06 3:25 PM, "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:
>Basically, I was adding capacity to a system while doing a fresh install.
>Turns out /dev/sda which used to be the disk in the bottom slot became
>the disk in the top slot instead.
>That happened to be where the MDT was, which was promptly repartitioned
>and formatted.
>
>Not exactly something I was expecting....
Presumably you have no backups or snapshots of the MDT device? Lustre can
handle a lot of inconsistency between the MDT and OSTs, even without
running lfsck.
Also, there was once a similar situation with a reformatted MDT that was
partly recovered using the "ext3grep" utility. This allowed finding the
filename->inode mappings in the dirents in directory leaf blocks, and the
".." dirent allowed connecting the parent directories. In Lustre 2.x, the
"link" xattr on the MDT inodes could also be used to recover the filenames
even if the directory entries are lost.
This won't help as much if the whole disk has been overwritten by an OS
install, but if only part of the MDT was overwritten you may be surprised
how much is recoverable with ext4.
First order is to make a copy of the whole disk before you try any further
changes (this lets you try things and restart without losing any data if
things go badly).
Repartition the disk as it was before (possibly without any partition
table at all for Lustre, or it could be dumped into an image file if not
too huge). Then build and run the "findsuper" utility from the e2fsprogs
code (I've attached it here) and try and find any existing (old)
superblocks from before the reformat. You can tell superblocks from the
same filesystem by the same start/end/blocks and increasing group number:
byte_offset byte_start byte_end fs_blocks blksz grp mkfs/mount_time
sb_uuid label
1049600 1048576 525336576 512000 1024 0 Wed Sep 12 16:39:47
2012 8f8531a2
9438208 1048576 525336576 512000 1024 1 Wed Sep 12 16:39:47
2012 8f8531a2
26215424 1048576 525336576 512000 1024 3 Wed Sep 12
16:39:47 2012 8f8531a2
42992640 1048576 525336576 512000 1024 5 Wed Sep 12
16:39:47 2012 8f8531a2
59769856 1048576 525336576 512000 1024 7 Wed Sep 12
16:39:47 2012 8f8531a2
76547072 1048576 525336576 512000 1024 9 Wed Sep 12
16:39:47 2012 8f8531a2
135266304 1048576 8590983168 2097152 4096 1 Tue Jan 18
15:06:12 2011 e1e13f16 boot
210764800 1048576 525336576 512000 1024 25 Wed Sep 12
16:39:47 2012 8f8531a2
227542016 1048576 525336576 512000 1024 27 Wed Sep 12
16:39:47 2012 8f8531a2
403701760 1048576 8590983168 2097152 4096 3 Tue Jan 18
15:06:12 2011 e1e13f16 boot
412091392 1048576 525336576 512000 1024 49 Wed Sep 12
16:39:47 2012 8f8531a2
525337600 525336576 9115271168 2097152 4096 0 Tue Jan 18
15:06:12 2011 e1e13f16 root_fc13
659554304 525336576 9115271168 2097152 4096 1 Tue Jan 18
15:06:12 2011 e1e13f16 root_fc13
659750912 525533184 17705402368 4194304 4096 1 Thu Jan 13
14:29:26 2011 6740a155
Then, run "e2fsck -fn -b {block} -B 4096 /dev/XXX" for one of the MDT
superblocks (which will clobber the old superblocks. This will
potentially recover some of your old MDT filesystem into lost+found, and
you can move these into a directory called "ROOT" at the top. Use
"getfattr" to extract the filenames from the "link" xattr.
Hope this helps. This is one reason why I encourage everyone to make full
"dd" backups of their MDT device. It doesn't take much space, but is
critical to the whole filesystem.
Cheers, Andreas
>> -----Original Message-----
>> From: Colin Faber [mailto:colin_faber at xyratex.com]
>> Sent: Wednesday, June 26, 2013 5:08 PM
>> To: Andrus, Brian Contractor
>> Cc: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Completely lost MGT/MDT
>>
>> Can you describe the failure in more detail?
>>
>> "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:
>>
>> >All,
>> >
>> >We have a sizeable filesystem and during a hardware upgrade, our MDT
>> disk was completely lost.
>> >I am trying to find if and how to recover from such an event, but am
>>not
>> finding anything.
>> >
>> >We were running lustre 2.3 and have upgraded to 2.4 (or are in the
>>process
>> of it).
>> >
>> >Can anyone point me in the right direction here?
>> >
>> >Thanks in advance,
>>
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
-------------- next part --------------
A non-text attachment was scrubbed...
Name: findsuper.c
Type: application/octet-stream
Size: 8321 bytes
Desc: findsuper.c
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130628/2402c4ab/attachment.obj>
More information about the lustre-discuss
mailing list