[Lustre-discuss] Aborting recovery

Brian J. Murrell Brian.Murrell at Sun.COM
Fri Mar 6 11:55:51 PST 2009


On Fri, 2009-03-06 at 20:09 +0100, Thomas Roth wrote:
> 
> But this is not what our users observe. Even on an otherwise perfectly
> working system, they report I/O errors on access to some files.

EIO == eviction.

> I  can usually see something happening in the logs of OST and client:
> The OST starts with "timeout on bulk PUT after 6+0s", which the OST is
> first "ignoring bulk IO comm error" in the hope that "client will
> retry".

Wait a minute.  This thread is about server recovery, not communications
failures.  You are mixing up errors and situations here.

Communications failures will result in timeouts on the server and that
will result in evictions which will result in EIOs for your
applications.  This has got nothing to do with server recovery though.

> "Request ... has timed out
> (limit 7s)", "Connection to service was lost; in progress operations
> using this service will fail", finally "Connection restored to service".

Yes.  This is a timeout and nothing to do with the subject of server
recovery.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090306/1ad66649/attachment.pgp>


More information about the lustre-discuss mailing list