[Lustre-discuss] MDT connection refusal: still busy with 2 active RPCs

Thomas Roth t.roth at gsi.de
Thu Apr 9 10:17:10 PDT 2009


Hi all,

we are suffering from an increasing unusability of our cluster due to
refused connections, with typical log entries on the MDS:

ldlm_lib.ctarget_handle_connect lustre-MDT0000: refuse reconnection from
77cbd453-ee72-fe75-cb06-c49179e0a011 at Lustre-Client@tcp to
0xffff810111341000; still busy with 2 active RPCs

These messages are surrounded by an increasing amount of "triggered
watchdogs" and Log-dumps, which contain pretty much what can also be
seen in /var/log/kern.log.

I have searched whatever hits Google gave me for "busy with N active
RPCs", but found no conclusive answer as to what caused this behavior
and - more important - how to repair it.
Right now all connectivity to the MDT was lost in the end, so I had to
restart the MDS.

Thanks,
Thomas



More information about the lustre-discuss mailing list