[Lustre-discuss] Lustre locking up on login/interactive nodes

Brock Palen brockp at umich.edu
Mon Jul 21 08:43:39 PDT 2008


Every so often lustre locks up. It will recover eventually. The  
process show this self's in 'D'  Uninterruptible IO Wait.  This case  
it was 'ar' making an archive.

Dmesg then shows:

Lustre: nobackup-MDT0000-mdc-00000101fc467800: Connection to service  
nobackup-MDT0000 via nid 141.212.30.184 at tcp was lost; in progress  
operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by nobackup-MDT0000; in  
progress operations using this service will fail.
LustreError: 17575:0:(client.c:519:ptlrpc_import_delay_req()) @@@  
IMP_INVALID  req at 0000010189e2f400 x912452/t0  
o101->nobackup-MDT0000_UUID at 141.212.30.184@tcp:12 lens 488/768 ref 1  
fl Rpc:P/0/0 rc 0/0
LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue())  
ldlm_cli_enqueue: -108
LustreError: 27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@  
IMP_INVALID  req at 00000101ed528a00 x912464/t0  
o101->nobackup-MDT0000_UUID at 141.212.30.184@tcp:12 lens 440/768 ref 1  
fl Rpc:/0/0 rc 0/0
LustreError: 27076:0:(mdc_locks.c:423:mdc_finish_enqueue())  
ldlm_cli_enqueue: -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode  
12653753 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode  
12195682 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) Skipped  
46 previous similar messages
Lustre: nobackup-MDT0000-mdc-00000101fc467800: Connection restored to  
service nobackup-MDT0000 using nid 141.212.30.184 at tcp.
LustreError: 11-0: an error occurred while communicating with  
141.212.30.184 at tcp. The mds_close operation failed with -116
LustreError: 11-0: an error occurred while communicating with  
141.212.30.184 at tcp. The mds_close operation failed with -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) inode  
11441446 mdc close failed: rc = -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped  
113 previous similar messages


Is there special options that should be done on interactive/login  
nodes?  I remember something about how much memory should be available  
on login vs batch nodes. But I don't know how to change that, I just  
assumed lustre would use it.  Login nodes have 8GB.
__________________________________________________
www.palen.serveftp.net
Center for Advanced Computing
http://cac.engin.umich.edu
brockp at umich.edu






More information about the lustre-discuss mailing list