[Lustre-discuss] strange slowdown

Aaron Knister aaron at iges.org
Tue Dec 18 10:05:17 PST 2007


I have some logs from the MDS/OSS startup. One thing I see  
consistently on the OST's having this problem is "still busy with 9  
active RPCs". The number is always 9 when it hangs up. I've attached  
some of the dk logs. Thank you so much for your help.

  00010000:00000400:0:1198000430.377418:0:5835:0:(ldlm_lib.c: 
517:target_handle_reconnect()) data-OST0004: 5ca85557-1b99-442d-d494- 
f1096d3fa4c4 reconnecting
00010000:00000400:0:1198000430.377496:0:5835:0:(ldlm_lib.c: 
746:target_handle_connect()) data-OST0004: refuse reconnection from 5ca85557-1b99-442d-d494-f1096d3fa4c4 at 192.168.64.102 
@o2ib to 0xffff810405037000; still busy with 9 active RPCs
00010000:00020000:0:1198000430.377787:0:5835:0:(ldlm_lib.c: 
1460:target_send_reply_msg()) @@@ processing error (-16)   
req at ffff810407a35400 x35081142/t0 o8->5ca85557-1b99-442d-d494- 
f1096d3fa4c4 at NET_0x50000c0a84066_UUID:0/0 lens 304/200 e 0 to 0 dl  
1198000530 ref 1 fl Interpret:/0/0 rc -16/0
00010000:00000400:0:1198000455.377760:0:5839:0:(ldlm_lib.c: 
517:target_handle_reconnect()) data-OST0004: 5ca85557-1b99-442d-d494- 
f1096d3fa4c4 reconnecting
00010000:00000400:0:1198000455.378250:0:5839:0:(ldlm_lib.c: 
746:target_handle_connect()) data-OST0004: refuse reconnection from 5ca85557-1b99-442d-d494-f1096d3fa4c4 at 192.168.64.102 
@o2ib to 0xffff810405037000; still busy with 9 active RPCs
00010000:00020000:0:1198000455.378955:0:5839:0:(ldlm_lib.c: 
1460:target_send_reply_msg()) @@@ processing error (-16)   
req at ffff810423ceba00 x35081148/t0 o8->5ca85557-1b99-442d-d494- 
f1096d3fa4c4 at NET_0x50000c0a84066_UUID:0/0 lens 304/200 e 0 to 0 dl  
1198000555 ref 1 fl Interpret:/0/0 rc -16/0
Debug log: 6 lines, 6 kept, 0 dropped.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dklogs.zip
Type: application/zip
Size: 38189 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071218/84b3dcf2/attachment.zip>
-------------- next part --------------




On Dec 15, 2007, at 11:32 PM, Oleg Drokin wrote:

>
> On Dec 13, 2007, at 6:52 PM, Aaron Knister wrote:
>
>> Just kidding...I spoke WAY too soon. It's acting up again.
>
> Unfortunately all of logs you provided so far are in the middle of
> problems, when lustre client was evicted already. How about point when
> it all started?
> Also, sysrq-t from OSSes to find out what is it doing would be useful.
>
> Bye,
>    Oleg

Aaron Knister
Associate Systems Administrator/Web Designer
Center for Research on Environment and Water

(301) 595-7001
aaron at iges.org





More information about the lustre-discuss mailing list