[Lustre-discuss] recovering formatted OST

Wojciech Turek wjt27 at cam.ac.uk
Thu Oct 21 09:30:39 PDT 2010


Hi Andreas,

I have restarted fsck after the segfault and it ran for several hours and it
segfaulted again.

Pass 3A: Optimizing directories
Failed to optimize directory ??? (73031): EXT2 directory corrupted
Failed to optimize directory ??? (73041): EXT2 directory corrupted
Failed to optimize directory ??? (75203): EXT2 directory corrupted
Failed to optimize directory ??? (75357): EXT2 directory corrupted
Failed to optimize directory ??? (75744): EXT2 directory corrupted
Failed to optimize directory ??? (75806): EXT2 directory corrupted
Failed to optimize directory ??? (75825): EXT2 directory corrupted
Failed to optimize directory ??? (75913): EXT2 directory corrupted
Failed to optimize directory ??? (75926): EXT2 directory corrupted
Failed to optimize directory ??? (76034): EXT2 directory corrupted
Failed to optimize directory ??? (76083): EXT2 directory corrupted
Failed to optimize directory ??? (76142): EXT2 directory corrupted
Failed to optimize directory ??? (76266): EXT2 directory corrupted
Failed to optimize directory ??? (76501): EXT2 directory corrupted
Failed to optimize directory ??? (77133): EXT2 directory corrupted
Failed to optimize directory ??? (77212): EXT2 directory corrupted
Failed to optimize directory ??? (77817): EXT2 directory corrupted
Failed to optimize directory ??? (77984): EXT2 directory corrupted
Failed to optimize directory ??? (77985): EXT2 directory corrupted
Segmentation fault

I noticed that the stack limit was quite low so I now changed it to
unlimited, also I increased limit for number of open files (maybe it can
help).

Now I have another problem. After last segfault I can not restart the fsck
due to MMP.

e2fsck -fy /dev/scratch2_ost16vg/ost16lv
e2fsck 1.41.10.sun2 (24-Feb-2010)
e2fsck: MMP: fsck being run while trying to open
/dev/scratch2_ost16vg/ost16lv

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 32768 <device>


Also when I try to access filesystem via debugfs it fails:

debugfs -c -R 'ls' /dev/scratch2_ost16vg/ost16lv
debugfs 1.41.10.sun2 (24-Feb-2010)
/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem
ls: Filesystem not open

Is there a way to clear teh MMP flag so it allows fsck to run?

Best regards,

Wojciech


On 21 October 2010 17:16, Andreas Dilger <andreas.dilger at oracle.com> wrote:

> Having a bit more context would help see where the problem is.  It may just
> be that with the other filesystems being formatted on top of the original
> that the filesystem is unrecoverable.
>
> E2fsck ran out of memory, but there shouldn't be a 2GB directory in the
> filesystem either, so it seems things are pretty messed up.
>
> It seems that some semblance of a filesystem was restored. You could try
> re-running e2fsck with more RAM or swap, or at least you could try looking
> at the filesystem with debugfs to see what is there.
>
>  If there are lots of files in lost+found, and they have xattrs attached
> that would be a good sign. If "stats" shows some groups with in-use inodes
> later in the filesystem then you could check some with "stat" for Lustre
> xattrs, or "dump" to look at the contents. If nome of this shows any results
> you may just have to give it up as lost.
>
> Cheers, Andreas
>
> On 2010-10-21, at 6:26, Wojciech Turek <wjt27 at cam.ac.uk> wrote:
>
> I ran e2fsck -fy on recreated LVM but it segfaulted after running for
> sometime:
>
> ...
> Block #2098188 (938180923) causes directory to be too big.  CLEARED.
> Error storing directory block information (inode=208387, block=0,
> num=261770): Memory allocation failed
> Recreate journal? yes
>
> Creating journal (32768 blocks):  Done.
>
> *** journal has been re-created - filesystem is now ext3 again ***
> e2fsck: aborted
> Segmentation fault
>
>
>
> rpm -qa | grep progs
> e2fsprogs-1.41.10.sun2-0redhat.x86_64
> e2fsprogs-devel-1.41.10.sun2-0redhat.x86_64
>
>
> Any idea what may have happened?
>
> Cheers
>
> Wojciech
>
> On 21 October 2010 03:32, Andreas Dilger < <andreas.dilger at oracle.com>
> andreas.dilger at oracle.com> wrote:
>
>> Probably LVM will refuse to create a whole-device PV if there is a
>> partition table.
>>
>> Cheers, Andreas
>>
>> On 2010-10-20, at 18:31, Wojciech Turek < <wjt27 at cam.ac.uk>
>> wjt27 at cam.ac.uk> wrote:
>>
>> Hi Andres,
>>
>> If I am going to recreate LVM on the whole device (as it was originaly
>> created) do I still need to overwrite MBR with zeros prior that? I guess
>> creation of the LVM will overwrite it but I am asking just to make sure.
>>
>> Wojciech
>>
>> On 20 October 2010 18:40, Andreas Dilger < <andreas.dilger at oracle.com><andreas.dilger at oracle.com>
>> andreas.dilger at oracle.com> wrote:
>>
>>> On 2010-10-20, at 11:36, Wojciech Turek wrote:
>>> > Your help is mostly appreciated Andreas. May I ask one more question?
>>> > I would like to perform the recovery procedure on the image of the disk
>>> (I am making it using dd) rather then the physical device. In order to do
>>> that is it enough to bind the image to the loop device and use that loop
>>> device as it is was a physical device?
>>>
>>> I'm not sure that is 100% safe.  Having an image may result in LVM to
>>> create the LVs with different parameters for some reason.  Instead, I'd keep
>>> the image as backup and do the recovery on the original device.  Also, the
>>> original device is much more likely to run e2fsck faster, which will help
>>> you get any remaining data back more quickly.
>>>
>>> > On 20 October 2010 17:41, Andreas Dilger < <andreas.dilger at oracle.com><andreas.dilger at oracle.com>
>>> andreas.dilger at oracle.com> wrote:
>>> > On 2010-10-20, at 10:15, Wojciech Turek < <wjt27 at cam.ac.uk><wjt27 at cam.ac.uk>
>>> wjt27 at cam.ac.uk> wrote:
>>> >> On 20 October 2010 16:32, Andreas Dilger <<andreas.dilger at oracle.com><andreas.dilger at oracle.com>
>>> andreas.dilger at oracle.com> wrote:
>>> >> Right - you need to recreate the LV exactly as it was before. If you
>>> created it all at once on the whole LUN then it is likely to be allocated in
>>> a linear way. If there are multiple LVs on the same LUN and they were
>>> expanded after use the chance of recovering them is very low.
>>> >> There was one LVM on that LUN I created it using  following commands:
>>> >>
>>> >> pvcreate /dev/sdc
>>> >> vgcreate ost16vg /dev/sdc
>>> >> lvcreate --name ost16v -l 100%VG ost16vg
>>> >>
>>> >> So in order to recreate that LVM on the formatted LUN i need to repeat
>>> above steps, is that right?
>>> >
>>> > If you know the exact LVM command then you probably don't need
>>> findsuper at all, since you should get back your original LV. The findsuper
>>> tool is useful if you don't know the original partition layout.
>>> >
>>> >> That said, if there were filesystems formatted in each partition, the
>>> amount of data loss may be large. You may have some saving grace if the
>>> first partitions are very small and fit inside the space previously used by
>>> the 400MB journal.
>>> >> Unfortunately new partitions use much more space than 400mb
>>> >>    8    32 7809904640 sdc
>>> >>    8    33   10484719 sdc1
>>> >>    8    34    4193280 sdc2
>>> >>    8    35    4193280 sdc3
>>> >>    8    36    8387584 sdc4
>>> >>    8    37 7782640640 sdc5
>>> >
>>> > The only good news is that the new filesystems will be offset from the
>>> original filesystem due to the LVM metadata, and you are more likely to have
>>> newer data away from the start of the filesystem, so there is some hope of
>>> getting some data back.
>>> >
>>> >
>>> >> On 2010-10-20, at 9:06, Wojciech Turek < <wjt27 at cam.ac.uk><wjt27 at cam.ac.uk>
>>> wjt27 at cam.ac.uk> wrote:
>>> >>
>>> >>> Thank you for quick reply.
>>> >>> Unfortunately all partitions were formatted with ext3, also I didn't
>>> mention earlier but the OST was placed on the LVM volume which is now gone
>>> as the installation script formatted the physical device. I understand  that
>>> this complicates things even further. In that case i guess firstly I need to
>>> try to recover the LVM information otherwise fsck will not be able to find
>>> anything is that right?
>>> >>>
>>> >>> Best regards,
>>> >>>
>>> >>> Wojciech
>>> >>>
>>> >>> On 20 October 2010 08:46, Andreas Dilger <<andreas.dilger at oracle.com><andreas.dilger at oracle.com>
>>> andreas.dilger at oracle.com> wrote:
>>> >>> On 2010-10-19, at 17:01, Wojciech Turek wrote:
>>> >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs
>>> was formatted by automatic installation script. This script created 5 small
>>> partitions and 6th partition consisting of the remaining space on that OST.
>>> Nothing else was written to that device since then. Is there a way to
>>> recover any data from that OST?
>>> >>>
>>> >>> Your best bet is to make a full "dd" backup of the OST to a new
>>> device (for safety), first restore the original partition table.  If there
>>> was not originally a partition table, then you can just erase the new
>>> partitions:
>>> >>>
>>> >>>  dd if=/dev/zero of=/dev/XXX bs=512 count=1
>>> >>>
>>> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a
>>> newer lustre RPM, if you don't have it).  It is likely that you will get
>>> some or most of the data back.  This depends heavily on exactly what was
>>> written over the original filesystem.
>>> >>>
>>> >>> If it was just a new partition table, there should be relatively
>>> little damage (ext3 is very robust this way, and can repair itself so long
>>> as the starting alignment is correct).  If there were filesystems formatted
>>> in each of these partitions, then the amount of data available will be
>>> reduced significantly.
>>> >>>
>>> >>> Cheers, Andreas
>>> >>> --
>>> >>> Andreas Dilger
>>> >>> Lustre Technical Lead
>>> >>> Oracle Corporation Canada Inc.
>>> >>>
>>> >>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101021/3c695adb/attachment.htm>


More information about the lustre-discuss mailing list