[Lustre-discuss] Lustre unstable

Gizo Nanava nanava at physik.uni-bonn.de
Tue Oct 12 06:38:39 PDT 2010


  Hello,

    very often we see the following error messages on lustre clients, 
after  which the client hangs.

   ~]# lfs check servers
          error: check 'lustrefs-OST000f-osc-ffff810069b5d400' Resource 
temporarily unavailable

On the server side the log is

===============
Oct 12 15:27:50 oss03 kernel: Lustre: 
4070:0:(ldlm_lib.c:837:target_handle_connect()) lustrefs-OST000f: refuse 
reconnection from 
68739331-439d-9818-9d70-47e53e0644c2 at 192.168.200.246@tcp to 
0xffff810416a5f200; still busy with 1 active RPCs
Oct 12 15:27:50 oss03 kernel: Lustre: 
4070:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 19 previous 
similar messages
Oct 12 15:27:50 oss03 kernel: LustreError: 
4070:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error 
(-16)  req at ffff81042f788c00 x1348876697425130/t0 
o8->68739331-439d-9818-9d70-47e53e0644c2 at NET_0x20000c0a8c8f6_UUID:0/0 
lens 368/264 e 0 to 0 dl 1286890170 ref 1 fl Interpret:/0/0 rc -16/0
Oct 12 15:27:50 oss03 kernel: LustreError: 
4070:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 19 previous 
similar messages
Oct 12 15:30:03 oss03 kernel: Lustre: 
4094:0:(ldlm_lib.c:540:target_handle_reconnect()) lustrefs-OST000f: 
68739331-439d-9818-9d70-47e53e0644c2 reconnecting
Oct 12 15:30:03 oss03 kernel: Lustre: 
4094:0:(ldlm_lib.c:540:target_handle_reconnect()) Skipped 37 previous 
similar messages
Oct 12 15:30:03 oss03 kernel: Lustre: 
4094:0:(ldlm_lib.c:837:target_handle_connect()) lustrefs-OST000f: refuse 
reconnection from 
68739331-439d-9818-9d70-47e53e0644c2 at 192.168.200.246@tcp to 
0xffff810416a5f200; still busy with 1 active RPCs
Oct 12 15:30:03 oss03 kernel: Lustre: 
4094:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 37 previous 
similar messages
Oct 12 15:30:03 oss03 kernel: LustreError: 
4094:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error 
(-16)  req at ffff81041eb0b000 x1348876697426353/t0 
o8->68739331-439d-9818-9d70-47e53e0644c2 at NET_0x20000c0a8c8f6_UUID:0/0 
lens 368/264 e 0 to 0 dl 1286890303 ref 1 fl Interpret:/0/0 rc -16/0
Oct 12 15:30:03 oss03 kernel: LustreError: 
4094:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 37 previous 
similar messages
=============

Lustre file servers are setup preserving almost all recommendations from 
its manual(except separate raid controllers for
each OSTs), but anyway we are getting this annoying messages.

Is this a bug? Should we upgrade to newer version?

Kernel: 2.6.18-164.11.1.el5
Lustre: 1.8.2

  Thank you







More information about the lustre-discuss mailing list