[Lustre-discuss] Timeouts and Dumps

Andreas Dilger adilger at sun.com
Mon Dec 22 19:49:53 PST 2008


On Dec 22, 2008  13:22 -0700, Denise Hummel wrote:
> Dec 22 13:00:44 oss1 kernel: LustreError: 138-a: lustre-OST0000: A
> client on nid 172.16.100.1 at tcp was evicted due to a lock blocking
> callback to 172.16.100.1 at tcp timed out: rc -107
> Dec 22 13:00:44 oss1 kernel: LustreError:
> 27250:0:(ost_handler.c:1065:ost_brw_write()) @@@ Eviction on bulk GET
> req at 00000100bff5c800 x91545/t0
> 27250:0:(ost_handler.c:1205:ost_brw_write()) lustre-OST0000: ignoring
> bulk IO comm error with

These messages could relate to network problems on the oss1 node.  That
said, this is most interesting if only oss1 is showing these messages.
In particular "eviction on bulk GET" indicates the network stopped working
in the middle of a data transfer.


> The messages in the syslog on the login node are:
> lustre-OST0000-osc-000001018197f800: Connection to service
> lustre-OST0000 via nid 172.16.100.41 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.

This is just the client's version of the same issue.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list