[Lustre-discuss] "unexpectedly long timeout"
Brock Palen
brockp at umich.edu
Wed Nov 5 12:53:07 PST 2008
New error I have never seen before, googling didn't fine much other
than an error involving IB. This node has IB, but lustre runs over TCP.
Nov 5 02:19:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c:
305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc
000001041802f600 req at 0000010005151a00 x1071812/t0 o4->nobackup-
OST000c_UUID at 10.164.3.245@tcp:6/4 lens 384/480 e 0 to 100 dl
1225842598 ref 2 fl Rpc:X/0/0 rc 0/0Nov 5 02:19:54 nyx668 kernel:
Lustre: 4329:0:(niobuf.c:305:ptlrpc_unregister_bulk()) Skipped 1
previous similar message
Nov 5 02:29:54 nyx668 kernel: Lustre: 4329:0:(niobuf.c:
305:ptlrpc_unregister_bulk()) @@@ Unexpectedly long timeout: desc
000001041802f600 req at 0000010005151a00 x1071812/t0 o4->nobackup-
OST000c_UUID at 10.164.3.245@tcp:6/4 lens 384/480 e 0 to 100 dl
1225842598 ref 2 fl Rpc:X/0/0 rc 0/0
On the OSS that provides OST000c The only errors I see from that
node are the usual, 'can't hear from node'
Nov 4 18:46:02 oss2 kernel: Lustre: 6426:0:(ost_handler.c:
1270:ost_brw_write()) nobackup-OST000c: ignoring bulk IO comm error
with 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e at NET_0x200000aa4029c_UUID id
12345-10.164.2.156 at tcp - client will retry
Nov 4 18:49:42 oss2 kernel: Lustre: nobackup-OST000c: haven't heard
from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at
10.164.2.156 at tcp) in 227 seconds. I think it's dead, and I am
evicting it.
Nov 4 18:49:42 oss2 kernel: Lustre: nobackup-OST000d: haven't heard
from client 0d8e8d79-bfac-9d81-a345-39aaf2d4bc0e (at
10.164.2.156 at tcp) in 227 seconds. I think it's dead, and I am
evicting it.
Any thoughts?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
More information about the lustre-discuss
mailing list