[Lustre-discuss] client randomly evicted

Aaron S. Knister aaron at iges.org
Wed Apr 30 08:16:29 PDT 2008


I have a lustre client that was randomly evicted early this morning. The errors from the dmesg are below. It's running infiniband. There were no infiniband errors that I could tell and all the mds/mgs and oss's said was "haven't heard from client xyz in 2277 seconds. Evicting". The client has halfway come back and now shows this - 


aaron at cola10:~ $ lfs df -h 
UUID bytes Used Available Use% Mounted on 
data-MDT0000_UUID 87.5G 6.4G 81.1G 7% /data[MDT:0] 
data-OST0000_UUID 5.4T 4.9T 439.6G 92% /data[OST:0] 
data-OST0001_UUID : inactive device 
data-OST0002_UUID : inactive device 
data-OST0003_UUID : inactive device 
data-OST0004_UUID : inactive device 
data-OST0005_UUID : inactive device 
data-OST0006_UUID : inactive device 
data-OST0007_UUID : inactive device 
data-OST0008_UUID : inactive device 
data-OST0009_UUID : inactive device 

filesystem summary: 5.4T 4.9T 439.6G 92% /data 

so it's reconnected to one of 10 osts. I tried to to an lctl --device {device} reconnect and it said "Error: Operation in progress". I have no idea what went wrong and I'm confident a reboot would fix it but I'd like to avoid it if possible. 


Thanks in advance. 

LustreError: 11-0: an error occurred while communicating with 192.168.64.70 at o2ib. The mds_statfs operation failed with -107 
Lustre: data-MDT0000-mdc-ffff81013037b800: Connection to service data-MDT0000 via nid 192.168.64.70 at o2ib was lost; in progress operations using this service will wait for recovery to complete. 
LustreError: 167-0: This client was evicted by data-MDT0000; in progress operations using this service will fail. 
LustreError: 22345:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -5 
LustreError: 22396:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff810136334400 x81717113/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22396:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22454:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8101136d2000 x81717114/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22454:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22463:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff810024ee4c00 x81717115/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22463:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22734:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8101316c8200 x81717138/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22734:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22736:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8101136d2c00 x81717139/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22736:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22912:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8101136d2c00 x81717140/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22912:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22971:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff81012cebb000 x81717143/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22971:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 2 previous similar messages 
LustreError: 22971:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 22971:0:(llite_lib.c:1508:ll_statfs_internal()) Skipped 2 previous similar messages 
LustreError: 23781:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff81012bd02000 x81717144/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 23781:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 23796:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff81006c776000 x81717156/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 23827:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff81013cbae400 x81717157/t0 o41->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 23827:0:(llite_lib.c:1508:ll_statfs_internal()) mdc_statfs fails: rc = -108 
LustreError: 23827:0:(llite_lib.c:1508:ll_statfs_internal()) Skipped 1 previous similar message 
LustreError: 22346:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8100a5f3d400 x81717169/t0 o35->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 296/896 ref 1 fl Rpc:/0/0 rc 0/0 
LustreError: 22346:0:(file.c:97:ll_close_inode_openhandle()) inode 21601226 mdc close failed: rc = -108 
Lustre: data-MDT0000-mdc-ffff81013037b800: Connection restored to service data-MDT0000 using nid 192.168.64.70 at o2ib. 
LustreError: 11-0: an error occurred while communicating with 192.168.64.71 at o2ib. The ost_statfs operation failed with -107 
Lustre: data-OST0001-osc-ffff81013037b800: Connection to service data-OST0001 via nid 192.168.64.71 at o2ib was lost; in progress operations using this service will wait for recovery to complete. 
LustreError: 11-0: an error occurred while communicating with 192.168.64.71 at o2ib. The ost_statfs operation failed with -107 
LustreError: 167-0: This client was evicted by data-OST0001; in progress operations using this service will fail. 
LustreError: 167-0: This client was evicted by data-OST0002; in progress operations using this service will fail. 
LustreError: 24093:0:(llite_lib.c:1520:ll_statfs_internal()) obd_statfs fails: rc = -5 
Lustre: data-OST0000-osc-ffff81013037b800: Connection restored to service data-OST0000 using nid 192.168.64.71 at o2ib. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080430/a3f25ff5/attachment.htm>


More information about the lustre-discuss mailing list