[Lustre-discuss] steps to take to replace a failed ost with permanent data loss

Fri Oct 15 22:31:00 PDT 2010

If the old OST is still accessible, you can copy the last_rcvd file and O/0/LAST_ID file, copy them over to the reformatted OST, and it should take on the identity of the old OST. 

The only other thing that identifies the filesystem is the label, which should be set  by mkfs.lustre if the index is specified. 

Cheers, Andreas

On 2010-10-14, at 13:22, Lisa Giacchetti <lisa at fnal.gov> wrote:
> I am looking a definitive list of steps a lustre clustre admin should take to recover from the following scenario:
>   1) an OST in the cluster has had a permanent data failure: The data can not be recovered but
>       device itself will fixed. Please assume that the device is NOT mounted any more on the OSS it
>       was being served from and therefore is NOT listed in the "lctl dl" command on that OSS.
>   2) data lost is not needed and there are no backups of it
>   3) It would be beneficial to be able to replace the OST with as the same device. (ie reuse the index)
>       but please include what is used in the "--index" parameter of each command as the documentation
>       on this is severely lacking
>   4) running mgs and mdt on two separate servers
>   5) there is no fail-over of any kind set up
> 
> I have tried to find the appropriate steps to take  and commands to use from within the docs and
> have been unsuccessful. So Unsuccessful that I have had to remake my entire cluster.
> If you need more clarification on the scenario before being able to tell me what steps to take - please
> ask for the info you need.
> 
> Anyone?
> 
> Lisa Giacchetti
> 
> <lisa.vcf>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss