[Lustre-discuss] Early replies from pre-1.6.5 servers?

Daniel Kobras kobras at linux.de
Mon Jul 28 07:50:51 PDT 2008


Hi!

While debugging connection problems on a Lustre client running 1.6.5.1
on RHEL5, I discovered early replies in the client debug output.
Adaptive timeouts are disabled on the client, and our server
infrastructure in running stock 1.6.4.3 (RHEL5), still. As far as I
understood, early replies should only occur post 1.6.5 when adaptive
timeouts are active.  Did I get this wrong, and early replies are
prefectly valid even in our setup? Or is the 1.6.5.1 client
misinterpreting message headers?

I'm attaching a debug trace for a single XID on the client and the
MDS/MGS.

Thanks,

Daniel.

-------------- next part --------------
==========> client (10.1.2.123 at o2ib, Lustre 1.6.5.1) <==========

root at client123 # lctl debug_file /tmp/lustre-debug.client.log | grep 266023
00000100:00100000:2:1217244344.471204:0:9991:0:(client.c:1784:ptlrpc_queue_wait()) Sending RPC pname:cluuid:pid:xid:nid:opc ll_cfg_requeue:38cc9155-5e64-1d01-bbf2-8b621120e1b0:9991:x266023:10.1.2.3 at o2ib:101
00000100:00000200:2:1217244344.471250:0:9991:0:(niobuf.c:540:ptl_send_rpc()) Setup reply buffer: 368 bytes, xid 266023, portal 25
00000100:00000040:2:1217244344.471269:0:9991:0:(niobuf.c:561:ptl_send_rpc()) @@@ send flg=0  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0
00000100:00000200:2:1217244344.471299:0:9991:0:(niobuf.c:70:ptl_send_buf()) Sending 232 bytes to portal 26, xid 266023, offset 0
00000100:00000200:2:1217244344.471379:0:9991:0:(client.c:1871:ptlrpc_queue_wait()) @@@ -- sleeping for 100000 ticks  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0
00000100:00000200:2:1217244344.471399:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0
00000100:00000200:2:1217244344.471416:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0
00000100:00000200:1:1217244344.471440:0:3049:0:(events.c:55:request_out_callback()) @@@ type 4, status 0  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0
00000100:00000040:1:1217244344.471458:0:3049:0:(client.c:1526:__ptlrpc_req_finished()) @@@ refcount now 1  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0
00000100:00000200:1:1217244344.471565:0:3049:0:(events.c:84:reply_in_callback()) @@@ type 1, status 0  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0
00000100:00001000:1:1217244344.471578:0:3049:0:(events.c:112:reply_in_callback()) @@@ Early reply received: mlen=240 offset=0 replen=368 replied=0 unlinked=0  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0
00000100:00000200:2:1217244344.471652:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0
00000100:00000200:2:1217244344.471668:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0
00000100:00000100:2:1217244444.471354:0:9991:0:(client.c:1198:ptlrpc_expire_one_request()) @@@ timeout (sent at 1217244344, 100s ago)  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0
00000100:02000400:2:1217244444.471376:0:9991:0:(client.c:1206:ptlrpc_expire_one_request()) Request x266023 sent from MGC10.1.2.3 at o2ib to NID 10.1.2.3 at o2ib 100s ago has timed out (limit 100s).
00000100:00000200:2:1217244444.471845:0:9991:0:(events.c:84:reply_in_callback()) @@@ type 5, status 0  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:X/0/0 rc 0/0
00000100:00100000:2:1217244444.471859:0:9991:0:(events.c:102:reply_in_callback()) @@@ unlink  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:X/0/0 rc 0/0
00000100:00100000:2:1217244444.471970:0:9991:0:(client.c:2091:ptlrpc_abort_inflight()) @@@ inflight  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:X/0/0 rc 0/0
00000100:00000200:2:1217244444.472014:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 1 for  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:EX/0/0 rc -4/0
00000100:00100000:2:1217244444.472029:0:9991:0:(client.c:1891:ptlrpc_queue_wait()) Completed RPC pname:cluuid:pid:xid:nid:opc ll_cfg_requeue:38cc9155-5e64-1d01-bbf2-8b621120e1b0:9991:x266023:10.1.2.3 at o2ib:101
00000100:00100000:2:1217244444.472038:0:9991:0:(client.c:1905:ptlrpc_queue_wait()) @@@ err rc=0 status=-4  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:EX/0/0 rc -4/0
00000100:00000040:2:1217244444.472229:0:9991:0:(client.c:1526:__ptlrpc_req_finished()) @@@ refcount now 0  req at ffff810194886a00 x266023/t0 o101->MGS at MGC10.1.2.3@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Interpret:EX/0/0 rc -4/0

root at client123 # ps xa | grep -E '3049|9991' | grep -v grep
 3049 ?        S      0:00 [kiblnd_sd_01]
 9991 ?        S      0:00 [ll_cfg_requeue]

==========> MDS (10.1.2.3 at o2ib, Lustre 1.6.4.3) <==========

root at mds # lctl debug_file /tmp/lustre-debug.mds.log | grep 266023
20000000:01000000:2:1217244344.469271:0:5378:0:(mgs_handler.c:549:mgs_handle()) @@@ enqueue  req at ffff810107882050 x266023/t0 o101->38cc9155-5e64-1d01-bbf2-8b621120e1b0 at NET_0x500000a01027b_UUID:-1 lens 232/0 ref 0 fl Interpret:/0/0 rc 0/0

root at mds # ps xa | grep 5378 | grep -v grep
 5378 ?        S      2:23 [ll_mgs_01]


More information about the lustre-discuss mailing list