[Lustre-discuss] Recovery fails if clients not connected

Andreas Dilger adilger at sun.com
Tue Jan 20 21:41:17 PST 2009


On Jan 20, 2009  18:05 -0500, Roger Spellman wrote:
> When the MDSes started up, Linux HA chose one to be active.  That system
> mounted the MDT.
>  
> I looked at the file  /proc/fs/lustre/mds/tacc-MDT0000/recovery_status,
> and it showed:
>  
> [root at ts-tacc-01 ~]# cat
> /proc/fs/lustre/mds/tacc-MDT0000/recovery_status 
> status: RECOVERING
> recovery_start: 0
> time_remaining: 0
> connected_clients: 0/5
> completed_clients: 0/5
> replayed_requests: 0/??
> queued_requests: 0
> next_transno: 17768
>  
>  
> ***** Note that recovery_start and time_remaining are both zero. *****
>  
> I waited a several minutes, and this file was the same.
>  
> I was waiting for recovery to complete before trying to mount the OSTs.
> However, it appears that this would never occur!
>  
> Does this look like a bug? 

No, this is intentional.  It is to avoid the situation where the MDS
is having network problems and a sysadmin might reboot the MDS to try
and resolve the problem.  The MDS will not begin recovery until at
least one of the clients connects to the MDS.

If you want to abort recovery without the clients being present you
can run "lctl --device ${mds_device} abort_recovery" on the MDS.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list