[Lustre-devel] imperative recovery

Fri Jan 9 16:50:16 PST 2009

On Jan 09, 2009  09:04 -0800, Robert Read wrote:
> On Jan 9, 2009, at 07:27 , Nicholas Henke wrote:
> > This would be a great enhancement for OSS failover or reboot, it is  
> > really the only way we'll get to recovery times under ~2.5 x obd_timeout.  
> >
> > I do think this will miss a significant case: combo MGS+MDS. A  
> > majority of our customers are deploying with this configuration.
> > Perhaps exposing this mechanism on the clients via a /proc file
> > would be enough - that way a failover framework
> > could manually trigger the timeout and/or nid switching.
> 
> Yes, exactly what I was thinking. Exposing this feature via proc (or  
> lctl) on the clients is the first step. It's has minimal impact,  
> requires no changes to the server, and should integrate well with  
> existing failover frameworks.  We also need to get the server to end  
> recovery sooner (without waiting for all the stale exports), but VBR  
> should help with that.

Hey, wouldn't (essentially) "lctl --device $foo recover" do the trick
today?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.