[Lustre-discuss] strange slowdown

Aaron Knister aaron at iges.org
Thu Dec 13 08:48:40 PST 2007


On the client i see this --

00000100:00100000:1:1197564425.523422:0:4024:0:(client.c: 
1339:ptlrpc_free_committed()) data-MDT0000-mdc-ffff8108275ec800: skip  
recheck: last_committed 54963341
00000100:00100000:1:1197564425.523425:0:4024:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4024:370:192.168.64.70 at o2ib:400
00000100:00100000:1:1197564425.523434:0:4024:0:(client.c: 
1339:ptlrpc_free_committed()) data-OST0000-osc-ffff8108275ec800: skip  
recheck: last_committed 13452811
00000100:00100000:1:1197564425.523437:0:4024:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4024:371:192.168.64.71 at o2ib:400
00000100:00100000:1:1197564425.523440:0:4024:0:(client.c: 
1339:ptlrpc_free_committed()) data-OST0001-osc-ffff8108275ec800: skip  
recheck: last_committed 13219554
00000100:00100000:1:1197564425.523443:0:4024:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4024:372:192.168.64.71 at o2ib:400
00000100:00100000:1:1197564425.523470:0:4025:0:(client.c: 
733:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd-recov:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4025:376:192.168.64.71 at o2ib:8
00000100:00100000:1:1197564425.523517:0:4024:0:(client.c: 
1339:ptlrpc_free_committed()) data-OST0002-osc-ffff8108275ec800: skip  
recheck: last_committed 12117104
00000100:00100000:1:1197564425.523520:0:4024:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4024:373:192.168.64.71 at o2ib:400
00000100:00100000:1:1197564425.523524:0:4024:0:(client.c: 
1339:ptlrpc_free_committed()) data-OST0003-osc-ffff8108275ec800: skip  
recheck: last_committed 11979465
00000100:00100000:1:1197564425.523532:0:4024:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4024:374:192.168.64.71 at o2ib:400
00000100:00100000:1:1197564425.523535:0:4024:0:(client.c: 
1339:ptlrpc_free_committed()) data-OST0004-osc-ffff8108275ec800: skip  
recheck: last_committed 13572310
00000100:00100000:1:1197564425.523537:0:4024:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4024:375:192.168.64.71 at o2ib:400
00000100:02020000:1:1197564425.524253:0:4025:0:(client.c: 
581:ptlrpc_check_status()) 11-0: an error occurred while communicating  
with 192.168.64.71 at o2ib. The ost_connect operation failed with -16
00000100:00100000:1:1197564425.524256:0:4025:0:(client.c: 
1339:ptlrpc_free_committed()) data-OST0005-osc-ffff8108275ec800: skip  
recheck: last_committed 13074662
00000100:00080000:1:1197564425.524265:0:4025:0:(import.c: 
759:ptlrpc_connect_interpret()) ffff810827423200 data-OST0005_UUID:  
changing import state from CONNECTING to DISCONN
00000100:00080000:1:1197564425.524268:0:4025:0:(import.c: 
801:ptlrpc_connect_interpret()) recovery of data-OST0005_UUID on  
192.168.64.71 at o2ib failed (-16)
00000100:00100000:1:1197564425.524276:0:4025:0:(client.c: 
949:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd-recov:87702303- 
a9bb-4c4f-6494-392157b9b9f7:4025:376:192.168.64.71 at o2ib:8


and on the server --

00000100:00100000:0:1197564324.597517:0:6891:0:(events.c: 
259:request_in_callback()) @@@ incoming req  req at ffff810402a64a00 x344/ 
t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
00000100:00100000:2:1197564324.597537:0:7120:0:(service.c: 
1048:ptlrpc_server_handle_request()) Handled RPC pname:cluuid 
+ref:pid:xid:nid:opc ll_ost_125:87702303- 
a9bb-4c4f-6494-392157b9b9f7+6:4024:x343:12345-192.168.64.102 at o2ib:400
00000100:00100000:2:1197564324.597541:0:7120:0:(service.c: 
1076:ptlrpc_server_handle_request()) request x343 opc 400 from  
12345-192.168.64.102 at o2ib processed in 33us (63us total) trans 0 rc 0/0
00000100:00100000:3:1197564324.597542:0:7119:0:(service.c: 
1048:ptlrpc_server_handle_request()) Handled RPC pname:cluuid 
+ref:pid:xid:nid:opc ll_ost_124:87702303- 
a9bb-4c4f-6494-392157b9b9f7+6:4024:x342:12345-192.168.64.102 at o2ib:400
00000100:00100000:1:1197564324.597543:0:7004:0:(service.c: 
1048:ptlrpc_server_handle_request()) Handled RPC pname:cluuid 
+ref:pid:xid:nid:opc ll_ost_09:87702303- 
a9bb-4c4f-6494-392157b9b9f7+6:4024:x341:12345-192.168.64.102 at o2ib:400
00000100:00100000:3:1197564324.597547:0:7119:0:(service.c: 
1076:ptlrpc_server_handle_request()) request x342 opc 400 from  
12345-192.168.64.102 at o2ib processed in 39us (78us total) trans 0 rc 0/0
00000100:00100000:1:1197564324.597548:0:7004:0:(service.c: 
1076:ptlrpc_server_handle_request()) request x341 opc 400 from  
12345-192.168.64.102 at o2ib processed in 42us (96us total) trans 0 rc 0/0
00000100:00100000:0:1197564324.597550:0:7001:0:(service.c: 
1032:ptlrpc_server_handle_request()) Handling RPC pname:cluuid 
+ref:pid:xid:nid:opc ll_ost_06:87702303- 
a9bb-4c4f-6494-392157b9b9f7+31:4025:x344:12345-192.168.64.102 at o2ib:8
00000010:00000002:0:1197564324.597553:0:7001:0:(ost_handler.c: 
1489:ost_handle()) connect
00010000:00000400:0:1197564324.597565:0:7001:0:(ldlm_lib.c: 
515:target_handle_reconnect()) data-OST0005: 87702303- 
a9bb-4c4f-6494-392157b9b9f7 reconnecting
00010000:00000400:0:1197564324.597568:0:7001:0:(ldlm_lib.c: 
744:target_handle_connect()) data-OST0005: refuse reconnection from 87702303-a9bb-4c4f-6494-392157b9b9f7 at 192.168.64.102 
@o2ib to 0xffff810401c96000; still busy with 9 active RPCs
00010000:00020000:0:1197564324.598221:0:7001:0:(ldlm_lib.c: 
1458:target_send_reply_msg()) @@@ processing error (-16)   
req at ffff810402a64a00 x344/t0 o8->87702303-a9bb-4c4f-6494-392157b9b9f7@: 
0/0 lens 304/200 e 0 to 0 dl 1197564424 ref 1 fl Interpret:/0/0 rc -16/0
00000100:00100000:0:1197564324.598232:0:7001:0:(service.c: 
1048:ptlrpc_server_handle_request()) Handled RPC pname:cluuid 
+ref:pid:xid:nid:opc ll_ost_06:87702303- 
a9bb-4c4f-6494-392157b9b9f7+31:4025:x344:12345-192.168.64.102 at o2ib:8
00000100:00100000:0:1197564324.598237:0:7001:0:(service.c: 
1076:ptlrpc_server_handle_request()) request x344 opc 8 from  
12345-192.168.64.102 at o2ib processed in 689us (721us total) trans 0 rc  
-16/-16
00000100:00100000:2:1197564336.503728:0:6894:0:(client.c: 
925:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:8a71c591-7623-dd81- 
fbc5-47fb75e4b360:6894:114:192.168.64.70 at o2ib:400
00000100:00100000:2:1197564336.503851:0:6894:0:(events.c: 
104:reply_in_callback()) @@@ unlink  req at ffff8104029dc400 x114/t0 o400- 
 >MGS at MGC192.168.64.70@o2ib_0:26/25 lens 128/256 e 0 to 6 dl  
1197564343 ref 1 fl Interpret:RN/8/0 rc 0/0
00000100:00100000:2:1197564336.503860:0:6894:0:(client.c: 
1158:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc  
ptlrpcd:8a71c591-7623-dd81- 
fbc5-47fb75e4b360:6894:114:192.168.64.70 at o2ib:400
Debug log: 269 lines, 269 kept, 0 dropped.

I don't see any insight here.



On Dec 13, 2007, at 11:36 AM, Oleg Drokin wrote:

> echo +vfstrace +rpctrace +inode >/proc/sys/lnet/debug

Aaron Knister
Associate Systems Administrator/Web Designer
Center for Research on Environment and Water

(301) 595-7001
aaron at iges.org






More information about the lustre-discuss mailing list