[Lustre-discuss] MDTs recovering or not recovering

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Wed May 28 15:49:32 PDT 2014


If recovery is aborted, any clients which did not complete the recovery process will be evicted by the MDS server.  If I remember correctly, there is a limit on the amount of time that recovery will run.  The time limit might get extended as more clients reconnect, but if there is no activity from the clients, the whole recovery process should timeout at some point. What does "lctl get_param mdt.*.recovery_status" show?  Have any clients completed (or even started) recovery?  I don't think the recovery timeout starts counting down until at least one client has reconnected.  If there is something preventing the clients from contacting the MDS server, maybe the server is just sitting there indefinitely.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


On May 20, 2014, at 10:13 PM, Javed Shaikh <javed.shaikh at anu.edu.au>
 wrote:

> CentOS 6.4 / Lustre 2.4.2 (both client and servers)
>  
> hi,
>  
> it looks like MDTs are not recovering after more than 12hours of being in that state.
> there’s hardly any activity happening on the MDS.
>  
> what would happen if the recovery is aborted through lctl?
>  
> thanks,
> javed
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss





More information about the lustre-discuss mailing list