[Lustre-discuss] Lustre unstable
Gizo Nanava
nanava at physik.uni-bonn.de
Tue Oct 12 06:38:39 PDT 2010
Hello,
very often we see the following error messages on lustre clients,
after which the client hangs.
~]# lfs check servers
error: check 'lustrefs-OST000f-osc-ffff810069b5d400' Resource
temporarily unavailable
On the server side the log is
===============
Oct 12 15:27:50 oss03 kernel: Lustre:
4070:0:(ldlm_lib.c:837:target_handle_connect()) lustrefs-OST000f: refuse
reconnection from
68739331-439d-9818-9d70-47e53e0644c2 at 192.168.200.246@tcp to
0xffff810416a5f200; still busy with 1 active RPCs
Oct 12 15:27:50 oss03 kernel: Lustre:
4070:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 19 previous
similar messages
Oct 12 15:27:50 oss03 kernel: LustreError:
4070:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
(-16) req at ffff81042f788c00 x1348876697425130/t0
o8->68739331-439d-9818-9d70-47e53e0644c2 at NET_0x20000c0a8c8f6_UUID:0/0
lens 368/264 e 0 to 0 dl 1286890170 ref 1 fl Interpret:/0/0 rc -16/0
Oct 12 15:27:50 oss03 kernel: LustreError:
4070:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 19 previous
similar messages
Oct 12 15:30:03 oss03 kernel: Lustre:
4094:0:(ldlm_lib.c:540:target_handle_reconnect()) lustrefs-OST000f:
68739331-439d-9818-9d70-47e53e0644c2 reconnecting
Oct 12 15:30:03 oss03 kernel: Lustre:
4094:0:(ldlm_lib.c:540:target_handle_reconnect()) Skipped 37 previous
similar messages
Oct 12 15:30:03 oss03 kernel: Lustre:
4094:0:(ldlm_lib.c:837:target_handle_connect()) lustrefs-OST000f: refuse
reconnection from
68739331-439d-9818-9d70-47e53e0644c2 at 192.168.200.246@tcp to
0xffff810416a5f200; still busy with 1 active RPCs
Oct 12 15:30:03 oss03 kernel: Lustre:
4094:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 37 previous
similar messages
Oct 12 15:30:03 oss03 kernel: LustreError:
4094:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
(-16) req at ffff81041eb0b000 x1348876697426353/t0
o8->68739331-439d-9818-9d70-47e53e0644c2 at NET_0x20000c0a8c8f6_UUID:0/0
lens 368/264 e 0 to 0 dl 1286890303 ref 1 fl Interpret:/0/0 rc -16/0
Oct 12 15:30:03 oss03 kernel: LustreError:
4094:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 37 previous
similar messages
=============
Lustre file servers are setup preserving almost all recommendations from
its manual(except separate raid controllers for
each OSTs), but anyway we are getting this annoying messages.
Is this a bug? Should we upgrade to newer version?
Kernel: 2.6.18-164.11.1.el5
Lustre: 1.8.2
Thank you
More information about the lustre-discuss
mailing list