[Lustre-discuss] Aborting recovery

Thomas Roth t.roth at gsi.de
Fri Mar 6 12:22:22 PST 2009



Brian J. Murrell wrote:
> On Fri, 2009-03-06 at 20:09 +0100, Thomas Roth wrote:
>> But this is not what our users observe. Even on an otherwise perfectly
>> working system, they report I/O errors on access to some files.
> 
> EIO == eviction.
> 
>> I  can usually see something happening in the logs of OST and client:
>> The OST starts with "timeout on bulk PUT after 6+0s", which the OST is
>> first "ignoring bulk IO comm error" in the hope that "client will
>> retry".
> 
> Wait a minute.  This thread is about server recovery, not communications
> failures.  You are mixing up errors and situations here.
> 
> Communications failures will result in timeouts on the server and that
> will result in evictions which will result in EIOs for your
> applications.  This has got nothing to do with server recovery though.

You are right, of course, this comes from a different situation. I just
assumed that if a client cannot cope with a 1sec-interruption due to a
communication failure, resulting in an EIO, how can it (resp. the
application) survive an interruption of the entire system of several hours.
Of course, if the client does react in a different manner during server
recovery, then also the application will see things differently.
I guess that's what I misunderstood. In fact the client's logs during
yesterdays recovery don't look so bad at all ;-) Just a number of
"Request xyz sent from MDT0000-mdc to NID MGS ... timed out", as expected.
Thanks for poiting this out.,
Thomas

>> "Request ... has timed out
>> (limit 7s)", "Connection to service was lost; in progress operations
>> using this service will fail", finally "Connection restored to service".
> 
> Yes.  This is a timeout and nothing to do with the subject of server
> recovery.
> 
> b.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list