[Lustre-devel] Imperative Recovery - forcing failover server stop blocking

Robert Read rread at sun.com
Tue Jun 23 10:20:42 PDT 2009


On Jun 23, 2009, at 07:53 , Andreas Dilger wrote:

> On Jun 23, 2009  13:49 +0100, Eric Barton wrote:
>> Yes, of course, you can just tune down the recovery window in the
>> knowledge that explicit notification has speeded the whole process of
>> client reconnection.  However if you have better knowledge about
>> client health than Lustre can have - e.g. hardware-specific health
>> monitoring, or just using the success/failure of the explicit
>> notification method itself - then why not use it to control exactly
>> when to stop waiting for dead clients?
>
> Yes, to restate this in a different way - the only way that Lustre  
> itself
> knows that some client will NOT be participating is after the  
> timeout has
> expired.  If there is some external mechanism that can inform Lustre  
> that
> one or more clients are dead and will not be participating in recovery
> then the recovery does not need to wait for the timeout.

The external mechanism should just evict the known dead clients from  
the server as soon as it discovers them so the server can begin  
recovery as soon as the live clients connect. Then we don't need to  
worry about the timeout.

robert





More information about the lustre-devel mailing list