[Lustre-discuss] mds_close operation failed with -107

Aaron Knister aaron at iges.org
Fri Feb 15 10:44:11 PST 2008


What does an ifconfig of the mds and clients look like? Can you run  
"ibcheckerrors -v" on an infiniband node and send the output? Also  
send the output of "ibqueryerrors.pl". Thanks!

On Feb 15, 2008, at 1:15 PM, Craig Prescott wrote:

>
> Hi;
>
> We had a couple of our Lustre clients get horked last night (one
> o2ib, and one tcp), df on the clients reported 'Input/output error'
> for the Lustre fs.
>
> Here's what was in the syslog of our MDS/MGS regarding the two
> nodes:
>
> Feb 14 22:50:43 hpcmds kernel: LustreError:
> 26165:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS
> from 12345-10.13.18.156 at tcp
> Feb 14 22:50:43 hpcmds kernel: LustreError:
> 26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error
> (-107)  req at ffff81011fa81400 x7753405/t0 o35-><?>@<?>:-1 lens 296/0  
> ref
> 0 fl Interpret:/0/0 rc -107/0
> Feb 14 22:50:43 hpcmds kernel: LustreError:
> 26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) Skipped 9 previous
> similar messages
> Feb 14 23:01:54 hpcmds kernel: LustreError:
> 31054:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS
> from 12345-10.13.26.125 at o2ib
> Feb 14 23:01:54 hpcmds kernel: LustreError:
> 31054:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error
> (-107)  req at ffff810123416c00 x73394313/t0 o35-><?>@<?>:-1 lens 296/0  
> ref
> 0 fl Interpret:/0/0 rc -107/0
>
> The corresponding syslog from one of the clients is appended below
> (they were very similar).
>
> Does anyone recognize this?  It says ENOTCONN, but there is no  
> evidence
> of anything being wrong with our ethernet or IB networks.  There  
> aren't
> any locking complaints.
>
> We are running Lustre 1.6.3 (plus a couple of patches) with a combo
> MGS/MDS.  Kernel is a fully patched 2.6.18-8.1.14el5 everywhere.
>
> Thanks,
> Craig
>
>
> Feb 14 23:01:54 r5b-s14 kernel: LustreError: 11-0: an error occurred
> while communicating with 10.13.24.40 at o2ib. The mds_close operation
> failed with -107
> Feb 14 23:01:54 r5b-s14 kernel: Lustre:
> ufhpc-MDT0000-mdc-ffff81022dfac400: Connection to service ufhpc- 
> MDT0000
> via nid 10.13.24.40 at o2ib was lost; in progress operations using this
> service will wait for recovery to complete.
> Feb 14 23:01:54 r5b-s14 kernel: Lustre: Skipped 5 previous similar  
> messages
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17647:0:(file.c:97:ll_close_inode_openhandle()) inode 41118870 mdc  
> close
> failed: rc = -4
> Feb 14 23:01:54 r5b-s14 kernel: LustreError: 167-0: This client was
> evicted by ufhpc-MDT0000; in progress operations using this service  
> will
> fail.
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 15785:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
> req at ffff81020f386400 x73394323/t0
> o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl
> Rpc:/0/0 rc 0/0
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 15785:0:(file.c:97:ll_close_inode_openhandle()) inode 24445371 mdc  
> close
> failed: rc = -108
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17719:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
> req at ffff8102166c0000 x73394324/t0
> o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl
> Rpc:/0/0 rc 0/0
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 15785:0:(file.c:97:ll_close_inode_openhandle()) Skipped 12 previous
> similar messages
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17640:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
> req at ffff81020f386400 x73394331/t0
> o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl
> Rpc:/0/0 rc 0/0
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17640:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 6 previous
> similar messages
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17640:0:(file.c:97:ll_close_inode_openhandle()) inode 41118867 mdc  
> close
> failed: rc = -108
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17640:0:(file.c:97:ll_close_inode_openhandle()) Skipped 3 previous
> similar messages
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17254:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
> req at ffff81015e0e0000 x73394332/t0
> o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl
> Rpc:/0/0 rc 0/0
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17254:0:(file.c:97:ll_close_inode_openhandle()) inode 42611428 mdc  
> close
> failed: rc = -108
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17254:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous
> similar messages
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17323:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
> req at ffff8101c63af800 x73394341/t0
> o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl
> Rpc:/0/0 rc 0/0
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17323:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous
> similar messages
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17323:0:(file.c:97:ll_close_inode_openhandle()) inode 24249026 mdc  
> close
> failed: rc = -108
> Feb 14 23:01:54 r5b-s14 kernel: LustreError:
> 17323:0:(file.c:97:ll_close_inode_openhandle()) Skipped 11 previous
> similar messages
> Feb 14 23:02:35 r5b-s14 kernel: LustreError:
> 5746:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
> req at ffff81014dc04200 x73394350/t0
> o35->ufhpc-MDT0000_UUID at 10.13.24.40@o2ib:12 lens 296/1736 ref 1 fl
> Rpc:/0/0 rc 0/0
> Feb 14 23:02:35 r5b-s14 kernel: LustreError:
> 5746:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous
> similar messages
> Feb 14 23:02:35 r5b-s14 kernel: LustreError:
> 5746:0:(file.c:97:ll_close_inode_openhandle()) inode 42195621 mdc  
> close
> failed: rc = -108
> Feb 14 23:02:35 r5b-s14 kernel: LustreError:
> 5746:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous
> similar messages
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org







More information about the lustre-discuss mailing list