[Lustre-discuss] [HPDD-discuss] Recovering a failed OST

Martin Hecht hecht at hlrs.de
Tue May 20 07:49:37 PDT 2014


Hi bob,

just to make sure: You already followed:
http://wiki.lustre.org/index.php/Handling_File_System_Errors, especially
the steps for e2fsck linked there?

If you did *not yet* do any write operation to the damaged OST, you
might want to back up the whole OST first, using dd for instance (if the
underlying hardware still permits it).

If the situation described (empty O directory, lost LAST_ID entry)
occurred *after* the e2fsck, and you find lots of files in lost+found
when you mount the OST as ldiskfs, you can use
ll_recover_lost_found_objs to put them back in the correct place
(http://manpages.ubuntu.com/manpages/precise/man1/ll_recover_lost_found_objs.1.html)
- it is part of the lustre distribution. Once I had to run this several
times in order to restore the structure below.

best regards,
Martin

On 05/19/2014 08:24 PM, Bob Ball wrote:
> Oh, better still, as I kept looking, and the low-level panic
> retreated, I found this on the mdt:
>
> [root at lmd02 ~]# lctl get_param osc.*.prealloc_next_id
> ...
> osc.umt3-OST0025-osc.prealloc_next_id=6778336
>
> So, unless someone tells me that I am way off base, I'm going to
> proceed with the assumption that this is a valid starting point, and
> proceed to get my file system back online.
>
> bob
>
> On 5/19/2014 2:05 PM, Bob Ball wrote:
>> Google first, ask later.  I found this in the manuals:
>>
>>
>>       26.3.4 Fixing a Bad LAST_ID on an OST
>>
>> The procedures there spell out pretty well what I must do, so this
>> should be relatively straight forward.  But, does this comment refer
>> to just this OST, or to all OST?
>> *Note - *The file system must be stopped on all servers before
>> performing this procedure.
>>
>> So, is this the best approach to follow, allowing for the fact that
>> there is nothing at all left on the OST, or is there a better short
>> cut to choosing an appropriate LAST_ID?
>>
>> Thanks again,
>> bob
>>
>>
>> On 5/19/2014 1:50 PM, Bob Ball wrote:
>>> I need to completely remake a failed OST.  I have done this in the
>>> past, but this time, the disk failed in such a way that I cannot
>>> fully get recovery information from the OST before I destroy and
>>> recreate.  In particular, I am unable to recover the LAST_ID file,
>>> but successfully retrieved the last_rcvd and CONFIGS/* files.
>>>
>>> mount -t ldiskfs /dev/sde /mnt/ost
>>> pushd /mnt/ost
>>> cd O
>>> cd 0
>>> cp -p LAST_ID /root/reformat/sde
>>>
>>> The O directory exists, but it is empty.  What can I do concerning
>>> this missing LAST_ID file?  I mean, I probably have something,
>>> somewhere, from some previous recovery, but that is way, way out of
>>> date.
>>>
>>> My intent is to recreate this OST with the same index, and then put
>>> it back into production.  All files were moved off the OST before
>>> reaching this state, so nothing else needs to be recovered here.
>>>
>>> Thanks,
>>> bob
>>>
>>> _______________________________________________
>>> HPDD-discuss mailing list
>>> HPDD-discuss at lists.01.org
>>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>>>
>>
>>
>>
>> _______________________________________________
>> HPDD-discuss mailing list
>> HPDD-discuss at lists.01.org
>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>
>
>
>
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss at lists.01.org
> https://lists.01.org/mailman/listinfo/hpdd-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140520/c47c6665/attachment.htm>


More information about the lustre-discuss mailing list