[Lustre-discuss] MDTs recovering or not recovering
Mohr Jr, Richard Frank (Rick Mohr)
rmohr at utk.edu
Wed May 28 15:49:32 PDT 2014
If recovery is aborted, any clients which did not complete the recovery process will be evicted by the MDS server. If I remember correctly, there is a limit on the amount of time that recovery will run. The time limit might get extended as more clients reconnect, but if there is no activity from the clients, the whole recovery process should timeout at some point. What does "lctl get_param mdt.*.recovery_status" show? Have any clients completed (or even started) recovery? I don't think the recovery timeout starts counting down until at least one client has reconnected. If there is something preventing the clients from contacting the MDS server, maybe the server is just sitting there indefinitely.
Senior HPC System Administrator
National Institute for Computational Sciences
On May 20, 2014, at 10:13 PM, Javed Shaikh <javed.shaikh at anu.edu.au>
> CentOS 6.4 / Lustre 2.4.2 (both client and servers)
> it looks like MDTs are not recovering after more than 12hours of being in that state.
> there’s hardly any activity happening on the MDS.
> what would happen if the recovery is aborted through lctl?
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
More information about the lustre-discuss