[Lustre-devel] imperative recovery

Fri Jan 9 07:27:53 PST 2009

Nathaniel Rutman wrote:
> Eric Barton wrote:
>>> Other options I've thought of to explore this idea:
>>>
>>> - MGS notifies clients (somehow) after a server has restarted.
>>>     
> This seems like a no-brainer easy win today, and doesn't depend on any 
> advanced features like message priority.  The only scalability issue 
> would seem to be the broadcast of the message to all clients, but this 
> is no different than the current broadcast mechanism the MGS employs to 
> update client configs.  The message from the MGS would be taken as a 
> suggestion, "Why don't y'all time out all your current RPCs since I 
> noticed OST0004 restarted.  Oh, and use failover nid #2."  Current 
> replay/recovery need not be touched.

This would be a great enhancement for OSS failover or reboot, it is really the 
only way we'll get to recovery times under ~2.5 x obd_timeout. Adaptive Timeouts 
really aren't buying us much here, as at scale and under load we are seeing the 
timeouts approach the usual static obd_timeout of 300s. It only takes one client 
with a higher timeout to push the recovery time out.

I do think this will miss a significant case: combo MGS+MDS. A majority of our 
customers are deploying with this configuration. Perhaps exposing this mechanism 
on the clients via a /proc file would be enough - that way a failover framework 
could manually trigger the timeout and/or nid switching.

Nic