[lustre-discuss] failed OST recover

Sergey Zhumatiy serg at parallel.ru
Mon Nov 30 22:44:30 PST 2020


> Many years ago when I was using Lustre-1.8.X, I used to suffer the
> same nightmare as you now. The following procedure saved me. But
> I am not sure whether it works to you or not.
> 
   Thank you! I had found this recipe, but in new lustre versions it 
does not work, ll_recover_lost_found_objs does not exists any more. I 
have 2.12.2 installed.
   As I understand, its function is integrated into lfsck procedure now. 
But it does not work as I expect.

   Can anybody give me a clue how to force this procedure? Should I stop 
all clients and do lsfck with enabled broken OST? I do not want to 
experiment, while I have tens of users and one week of lustre 
unavailability without significant results looks very bad for me.

> 1. umount all the clients, umount OST.
> 
> 2. mount OST as ldiskfs:
> 
> 	mount -t ldiskfs /dev/<OST_device> /mnt
> 
> 3. Run the command:
> 
>     ll_recover_lost_found_objs -d <lost+found_dir>
> 
> At that event it restored about 70% of data back.
> 
> 
> In case that you want to remove the files which were lost in OST, but
> unfortunately using "rm -f <filename>" does not work:
> 
> 1. Record the full paths of the files which you want to remove.
> 
> 2. umount all client, OST, and MDT.
> 
> 3. Mount MDT as ldiskfs:
> 
> 	mount -t ldiskfs /dev/<MDT_device> /mnt
> 
> 4. Go to /mnt/ROOT/. You will find the completed directory tree of
>     your Lustre file system, but without the file contents. You can
>     remove the files you want from here.
> 
> 
> Cheers,
> 
> T.H.Hsieh
> 
> 
> On Mon, Nov 30, 2020 at 01:09:07PM +0300, Sergey Zhumatiy wrote:
>>    Hello!
>>    Please, help to resolve... One ost on my lustre installation has been
>> failed. It lost all fs metadatam so I couldn't mount it as lustre
>> filesystem. I've checked it by e2fsck and all data was moved into lost+found
>> folder. Then I moved this folder to another storage, re-created this ost
>> (with old target index), then put back lost+found folder.
>>
>>    After mount this ost lustre, I've started lfsck on mds. In several hours I
>> disabled this ost, because no client can work. Then lustre become heathy,
>> and I started lfs_migrate from this ost.
>>
>>    But it seems, that data was not restored by lfsck and lfs_migrate moved a
>> few of files and the rest is 'endpoint not connected'.
>>
>>    How can I restore some data and delete unrecoverable data?
>>
>> -- 
>>    With respect
>>                                                     Serg.
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


-- 
   ÈF ÈiÈWÈUÈ\ÈZÈcÈ^ÈZÈb
                                                    Serg.


More information about the lustre-discuss mailing list