[Lustre-discuss] "unexpectedly long timeout"

Brock Palen brockp at umich.edu
Wed Nov 5 12:53:07 PST 2008


New error I have never seen before, googling didn't fine much other  
than an error involving IB. This node has IB, but lustre runs over TCP.

Nov  5 02:19:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c: 
305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc  
000001041802f600  req at 0000010005151a00 x1071812/t0 o4->nobackup- 
OST000c_UUID at 10.164.3.245@tcp:6/4 lens 384/480 e 0 to 100 dl  
1225842598 ref 2 fl Rpc:X/0/0 rc 0/0Nov  5 02:19:54 nyx668 kernel:  
Lustre: 4329:0:(niobuf.c:305:ptlrpc_unregister_bulk()) Skipped 1  
previous similar message
Nov  5 02:29:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c: 
305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc  
000001041802f600  req at 0000010005151a00 x1071812/t0 o4->nobackup- 
OST000c_UUID at 10.164.3.245@tcp:6/4 lens 384/480 e 0 to 100 dl  
1225842598 ref 2 fl Rpc:X/0/0 rc 0/0

On the OSS that provides OST000c  The only errors I see from that  
node are the usual, 'can't hear from node'

Nov  4 18:46:02 oss2 kernel: Lustre: 6426:0:(ost_handler.c: 
1270:ost_brw_write()) nobackup-OST000c: ignoring bulk IO comm error  
with 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e at NET_0x200000aa4029c_UUID id  
12345-10.164.2.156 at tcp - client will retry
Nov  4 18:49:42 oss2 kernel: Lustre: nobackup-OST000c: haven't heard  
from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at  
10.164.2.156 at tcp) in 227 seconds. I think it's dead, and I am  
evicting it.
Nov  4 18:49:42 oss2 kernel: Lustre: nobackup-OST000d: haven't heard  
from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at  
10.164.2.156 at tcp) in 227 seconds. I think it's dead, and I am  
evicting it.

Any thoughts?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985






More information about the lustre-discuss mailing list