[Lustre-discuss] [HPDD-discuss] Recovering a failed OST
Bob Ball
ball at umich.edu
Thu May 22 06:57:37 PDT 2014
Thanks for the advice. Fortunately, the OST was completely drained of
files before all heck broke loose. With the help of the manual, a
couple of lustre list threads, and some long-lost memories of a similar
situation a few years back, I was able to bring the OST alive again,
albeit still read-only for the time being (2 days off for me, and now I
need to IO test it before I'll trust it again).
Cheers,
bob
On 5/20/2014 10:49 AM, Martin Hecht wrote:
> Hi bob,
>
> just to make sure: You already followed:
> http://wiki.lustre.org/index.php/Handling_File_System_Errors,
> especially the steps for e2fsck linked there?
>
> If you did *not yet* do any write operation to the damaged OST, you
> might want to back up the whole OST first, using dd for instance (if
> the underlying hardware still permits it).
>
> If the situation described (empty O directory, lost LAST_ID entry)
> occurred *after* the e2fsck, and you find lots of files in lost+found
> when you mount the OST as ldiskfs, you can use
> ll_recover_lost_found_objs to put them back in the correct place
> (http://manpages.ubuntu.com/manpages/precise/man1/ll_recover_lost_found_objs.1.html)
> - it is part of the lustre distribution. Once I had to run this
> several times in order to restore the structure below.
>
> best regards,
> Martin
>
> On 05/19/2014 08:24 PM, Bob Ball wrote:
>> Oh, better still, as I kept looking, and the low-level panic
>> retreated, I found this on the mdt:
>>
>> [root at lmd02 ~]# lctl get_param osc.*.prealloc_next_id
>> ...
>> osc.umt3-OST0025-osc.prealloc_next_id=6778336
>>
>> So, unless someone tells me that I am way off base, I'm going to
>> proceed with the assumption that this is a valid starting point, and
>> proceed to get my file system back online.
>>
>> bob
>>
>> On 5/19/2014 2:05 PM, Bob Ball wrote:
>>> Google first, ask later. I found this in the manuals:
>>>
>>>
>>> 26.3.4 Fixing a Bad LAST_ID on an OST
>>>
>>> The procedures there spell out pretty well what I must do, so this
>>> should be relatively straight forward. But, does this comment refer
>>> to just this OST, or to all OST?
>>> *Note - *The file system must be stopped on all servers before
>>> performing this procedure.
>>>
>>> So, is this the best approach to follow, allowing for the fact that
>>> there is nothing at all left on the OST, or is there a better short
>>> cut to choosing an appropriate LAST_ID?
>>>
>>> Thanks again,
>>> bob
>>>
>>>
>>> On 5/19/2014 1:50 PM, Bob Ball wrote:
>>>> I need to completely remake a failed OST. I have done this in the
>>>> past, but this time, the disk failed in such a way that I cannot
>>>> fully get recovery information from the OST before I destroy and
>>>> recreate. In particular, I am unable to recover the LAST_ID file,
>>>> but successfully retrieved the last_rcvd and CONFIGS/* files.
>>>>
>>>> mount -t ldiskfs /dev/sde /mnt/ost
>>>> pushd /mnt/ost
>>>> cd O
>>>> cd 0
>>>> cp -p LAST_ID /root/reformat/sde
>>>>
>>>> The O directory exists, but it is empty. What can I do concerning
>>>> this missing LAST_ID file? I mean, I probably have something,
>>>> somewhere, from some previous recovery, but that is way, way out of
>>>> date.
>>>>
>>>> My intent is to recreate this OST with the same index, and then put
>>>> it back into production. All files were moved off the OST before
>>>> reaching this state, so nothing else needs to be recovered here.
>>>>
>>>> Thanks,
>>>> bob
>>>>
>>>> _______________________________________________
>>>> HPDD-discuss mailing list
>>>> HPDD-discuss at lists.01.org
>>>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> HPDD-discuss mailing list
>>> HPDD-discuss at lists.01.org
>>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>>
>>
>>
>>
>> _______________________________________________
>> HPDD-discuss mailing list
>> HPDD-discuss at lists.01.org
>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140522/9fff90b1/attachment.htm>
More information about the lustre-discuss
mailing list