[Lustre-discuss] OST went down running Lustre 1.6.6

Brian Stone bgstone at sgi.com
Wed Feb 11 08:53:50 PST 2009


So, in this case, it appears that all clients did not complete recovery 
and recovery timed out. I assume you have a short amount of time to 
figure out who did not participate in recovery and get them to 
reconnect. What's the best way to get clients to reconnect that are not 
participating in recovery? What's the best way to identify clients that 
are not participating in recovery?

Thanks,
Brian Stone

Brian J. Murrell wrote:
> On Wed, 2009-02-11 at 10:46 -0500, Brian Stone wrote:
>   
>> Yes, I was using corruption to mean incomplete files.
>>     
>
> Ahhh.  OK.  That can be an artifact of a failure to complete recovery.
>
>   
>> So, let me 
>> rephrase, is there a way to avoid "incomplete files" after an OSS crash, 
>>     
>
> Yes, make sure that all clients are available to reconnect and
> participate in recovery.
>
>   
>> The lustre devices were not deactivated from the clients 
>> and MDS, would that possibly avoid the purge of data?
>>     
>
> No.  You are free to reboot an OSS any time you wish.  All clients will
> wait for it to come back up and recovery will proceed and complete, if
> all clients are still present to participate.
>
> Recovery is basically a process by which clients have an opportunity to
> retry any recent (i.e. in progress) transactions that may have gotten
> "lost" while an OSS crashes or reboots.  Until the OSS confirms to the
> client that it has written a given transaction to stable storage,
> clients hold on to them in case it needs to replay them.
>
> In the event of a crash, all clients reconnect and offer their recent
> transactions, however this must (currently) be done, serially, in the
> same order they were done originally.  If a given client fails to
> connect, the server cannot know that it did not have any transactions to
> commit and must therefore discard all further transactions and abort
> recovery, which can lead to this "data loss".
>
> b.
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   



More information about the lustre-discuss mailing list