[Lustre-discuss] MDS can't recover OSTs

Charles Taylor taylor at hpc.ufl.edu
Thu Apr 28 09:47:02 PDT 2011


We had a RAID array barf this morning resulting in some OST corruption  
which appeared to be successfully repaired with a combination of fsck  
and ll_recover_lost_found_objs.   The OSTs mounted OK but the MDS  
can't seem to recover its connection to two of the OSTs as we are  
seeing a continuing stream of the following in the MDS syslog.

Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(recover.c: 
67:ptlrpc_initiate_recovery()) crn-OST0013_UUID: starting recovery
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
608:ptlrpc_connect_import()) ffff810117426000 crn-OST0013_UUID:  
changing import state from DISCONN to CONNECTING
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
470:import_select_connection()) crn-OST0013-osc: connect to NID  
10.13.24.92 at o2ib last attempt 22689204132
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
544:import_select_connection()) crn-OST0013-osc: import  
ffff810117426000 using connection 10.13.24.92 at o2ib/10.13.24.92 at o2ib
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1091:ptlrpc_connect_interpret()) ffff810117426000 crn-OST0013_UUID:  
changing import state from CONNECTING to DISCONN
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1137:ptlrpc_connect_interpret()) recovery of crn-OST0013_UUID on  
10.13.24.92 at o2ib failed (-16)
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1091:ptlrpc_connect_interpret()) ffff81012e50d000 crn-OST0007_UUID:  
changing import state from CONNECTING to DISCONN
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1137:ptlrpc_connect_interpret()) recovery of crn-OST0007_UUID on  
10.13.24.91 at o2ib failed (-16)

It seems that we never see a  'oscc recovery finished' message on  
crnmds for OST0007 or OST0013.

We have not seen this problem before so we are trying to figure out  
how to get the MDT reconnected to these two OSTs.

Any one else been through this before?

Thanks,

Charlie Taylor
UF HPC Center






More information about the lustre-discuss mailing list