[Lustre-discuss] [HPDD-discuss] Recovering a failed OST

Bob Ball ball at umich.edu
Thu May 22 06:57:37 PDT 2014


Thanks for the advice.  Fortunately, the OST was completely drained of 
files before all heck broke loose.  With the help of the manual, a 
couple of lustre list threads, and some long-lost memories of a similar 
situation a few years back, I was able to bring the OST alive again, 
albeit still read-only for the time being (2 days off for me, and now I 
need to IO test it before I'll trust it again).

Cheers,
bob

On 5/20/2014 10:49 AM, Martin Hecht wrote:
> Hi bob,
>
> just to make sure: You already followed: 
> http://wiki.lustre.org/index.php/Handling_File_System_Errors, 
> especially the steps for e2fsck linked there?
>
> If you did *not yet* do any write operation to the damaged OST, you 
> might want to back up the whole OST first, using dd for instance (if 
> the underlying hardware still permits it).
>
> If the situation described (empty O directory, lost LAST_ID entry) 
> occurred *after* the e2fsck, and you find lots of files in lost+found 
> when you mount the OST as ldiskfs, you can use 
> ll_recover_lost_found_objs to put them back in the correct place 
> (http://manpages.ubuntu.com/manpages/precise/man1/ll_recover_lost_found_objs.1.html) 
> - it is part of the lustre distribution. Once I had to run this 
> several times in order to restore the structure below.
>
> best regards,
> Martin
>
> On 05/19/2014 08:24 PM, Bob Ball wrote:
>> Oh, better still, as I kept looking, and the low-level panic 
>> retreated, I found this on the mdt:
>>
>> [root at lmd02 ~]# lctl get_param osc.*.prealloc_next_id
>> ...
>> osc.umt3-OST0025-osc.prealloc_next_id=6778336
>>
>> So, unless someone tells me that I am way off base, I'm going to 
>> proceed with the assumption that this is a valid starting point, and 
>> proceed to get my file system back online.
>>
>> bob
>>
>> On 5/19/2014 2:05 PM, Bob Ball wrote:
>>> Google first, ask later.  I found this in the manuals:
>>>
>>>
>>>       26.3.4 Fixing a Bad LAST_ID on an OST
>>>
>>> The procedures there spell out pretty well what I must do, so this 
>>> should be relatively straight forward.  But, does this comment refer 
>>> to just this OST, or to all OST?
>>> *Note - *The file system must be stopped on all servers before 
>>> performing this procedure.
>>>
>>> So, is this the best approach to follow, allowing for the fact that 
>>> there is nothing at all left on the OST, or is there a better short 
>>> cut to choosing an appropriate LAST_ID?
>>>
>>> Thanks again,
>>> bob
>>>
>>>
>>> On 5/19/2014 1:50 PM, Bob Ball wrote:
>>>> I need to completely remake a failed OST.  I have done this in the 
>>>> past, but this time, the disk failed in such a way that I cannot 
>>>> fully get recovery information from the OST before I destroy and 
>>>> recreate.  In particular, I am unable to recover the LAST_ID file, 
>>>> but successfully retrieved the last_rcvd and CONFIGS/* files.
>>>>
>>>> mount -t ldiskfs /dev/sde /mnt/ost
>>>> pushd /mnt/ost
>>>> cd O
>>>> cd 0
>>>> cp -p LAST_ID /root/reformat/sde
>>>>
>>>> The O directory exists, but it is empty.  What can I do concerning 
>>>> this missing LAST_ID file?  I mean, I probably have something, 
>>>> somewhere, from some previous recovery, but that is way, way out of 
>>>> date.
>>>>
>>>> My intent is to recreate this OST with the same index, and then put 
>>>> it back into production.  All files were moved off the OST before 
>>>> reaching this state, so nothing else needs to be recovered here.
>>>>
>>>> Thanks,
>>>> bob
>>>>
>>>> _______________________________________________
>>>> HPDD-discuss mailing list
>>>> HPDD-discuss at lists.01.org
>>>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> HPDD-discuss mailing list
>>> HPDD-discuss at lists.01.org
>>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>>
>>
>>
>>
>> _______________________________________________
>> HPDD-discuss mailing list
>> HPDD-discuss at lists.01.org
>> https://lists.01.org/mailman/listinfo/hpdd-discuss
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140522/9fff90b1/attachment.htm>


More information about the lustre-discuss mailing list