[Lustre-discuss] Recovery fails if clients not connected

Andreas Dilger adilger at sun.com
Wed Jan 21 14:04:40 PST 2009


On Jan 21, 2009  15:22 -0500, Roger Spellman wrote:
> If I say
> 
> lctl --device /dev/sdb abort_recovery
> 
> I get an error.  Is it supposed to be a device number?  Where do I get
> that number?

Use "lctl dl" to get the lustre device number, or you can specify it by
the Lustre device name given by "lctl dl" as well.

> > -----Original Message-----
> > From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf
> Of
> > Andreas Dilger
> > Sent: Wednesday, January 21, 2009 12:41 AM
> > To: Roger Spellman
> > Cc: lustre-discuss at lists.lustre.org
> > Subject: Re: [Lustre-discuss] Recovery fails if clients not connected
> > 
> > On Jan 20, 2009  18:05 -0500, Roger Spellman wrote:
> > > When the MDSes started up, Linux HA chose one to be active.  That
> system
> > > mounted the MDT.
> > >
> > > I looked at the file
> /proc/fs/lustre/mds/tacc-MDT0000/recovery_status,
> > > and it showed:
> > >
> > > [root at ts-tacc-01 ~]# cat
> > > /proc/fs/lustre/mds/tacc-MDT0000/recovery_status
> > > status: RECOVERING
> > > recovery_start: 0
> > > time_remaining: 0
> > > connected_clients: 0/5
> > > completed_clients: 0/5
> > > replayed_requests: 0/??
> > > queued_requests: 0
> > > next_transno: 17768
> > >
> > >
> > > ***** Note that recovery_start and time_remaining are both zero.
> *****
> > >
> > > I waited a several minutes, and this file was the same.
> > >
> > > I was waiting for recovery to complete before trying to mount the
> OSTs.
> > > However, it appears that this would never occur!
> > >
> > > Does this look like a bug?
> > 
> > No, this is intentional.  It is to avoid the situation where the MDS
> > is having network problems and a sysadmin might reboot the MDS to try
> > and resolve the problem.  The MDS will not begin recovery until at
> > least one of the clients connects to the MDS.
> > 
> > If you want to abort recovery without the clients being present you
> > can run "lctl --device ${mds_device} abort_recovery" on the MDS.
> > 
> > 
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list