[Lustre-discuss] mds_close operation failed with -107

Craig Prescott prescott at hpc.ufl.edu
Fri Feb 15 10:15:12 PST 2008


Hi;

We had a couple of our Lustre clients get horked last night (one
o2ib, and one tcp), df on the clients reported 'Input/output error'
for the Lustre fs.

Here's what was in the syslog of our MDS/MGS regarding the two
nodes:

Feb 14 22:50:43 hpcmds kernel: LustreError: 
26165:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS 
from 12345-10.13.18.156 at tcp
Feb 14 22:50:43 hpcmds kernel: LustreError: 
26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error 
(-107)  req at ffff81011fa81400 x7753405/t0 o35-><?>@<?>:-1 lens 296/0 ref 
0 fl Interpret:/0/0 rc -107/0
Feb 14 22:50:43 hpcmds kernel: LustreError: 
26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) Skipped 9 previous 
similar messages
Feb 14 23:01:54 hpcmds kernel: LustreError: 
31054:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS 
from 12345-10.13.26.125 at o2ib
Feb 14 23:01:54 hpcmds kernel: LustreError: 
31054:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error 
(-107)  req at ffff810123416c00 x73394313/t0 o35-><?>@<?>:-1 lens 296/0 ref 
0 fl Interpret:/0/0 rc -107/0

The corresponding syslog from one of the clients is appended below
(they were very similar).

Does anyone recognize this?  It says ENOTCONN, but there is no evidence
of anything being wrong with our ethernet or IB networks.  There aren't
any locking complaints.

We are running Lustre 1.6.3 (plus a couple of patches) with a combo
MGS/MDS.  Kernel is a fully patched 2.6.18-8.1.14el5 everywhere.

Thanks,
Craig


Feb 14 23:01:54 r5b-s14 kernel: LustreError: 11-0: an error occurred 
while communicating with 10.13.24.40 at o2ib. The mds_close operation 
failed with -107
Feb 14 23:01:54 r5b-s14 kernel: Lustre: 
ufhpc-MDT0000-mdc-ffff81022dfac400: Connection to service ufhpc-MDT0000 
via nid 10.13.24.40 at o2ib was lost; in progress operations using this 
service will wait for recovery to complete.
Feb 14 23:01:54 r5b-s14 kernel: Lustre: Skipped 5 previous similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17647:0:(file.c:97:ll_close_inode_openhandle()) inode 41118870 mdc close 
failed: rc = -4
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 167-0: This client was 
evicted by ufhpc-MDT0000; in progress operations using this service will 
fail.
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
15785:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff81020f386400 x73394323/t0 
o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
15785:0:(file.c:97:ll_close_inode_openhandle()) inode 24445371 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17719:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff8102166c0000 x73394324/t0 
o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
15785:0:(file.c:97:ll_close_inode_openhandle()) Skipped 12 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff81020f386400 x73394331/t0 
o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 6 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(file.c:97:ll_close_inode_openhandle()) inode 41118867 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(file.c:97:ll_close_inode_openhandle()) Skipped 3 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17254:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff81015e0e0000 x73394332/t0 
o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17254:0:(file.c:97:ll_close_inode_openhandle()) inode 42611428 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17254:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff8101c63af800 x73394341/t0 
o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(file.c:97:ll_close_inode_openhandle()) inode 24249026 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(file.c:97:ll_close_inode_openhandle()) Skipped 11 previous 
similar messages
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff81014dc04200 x73394350/t0 
o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous 
similar messages
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(file.c:97:ll_close_inode_openhandle()) inode 42195621 mdc close 
failed: rc = -108
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous 
similar messages



More information about the lustre-discuss mailing list