[Lustre-devel] Imperative Recovery - forcing failover server stop blocking
Robert Read
rread at sun.com
Tue Jun 23 10:20:42 PDT 2009
On Jun 23, 2009, at 07:53 , Andreas Dilger wrote:
> On Jun 23, 2009 13:49 +0100, Eric Barton wrote:
>> Yes, of course, you can just tune down the recovery window in the
>> knowledge that explicit notification has speeded the whole process of
>> client reconnection. However if you have better knowledge about
>> client health than Lustre can have - e.g. hardware-specific health
>> monitoring, or just using the success/failure of the explicit
>> notification method itself - then why not use it to control exactly
>> when to stop waiting for dead clients?
>
> Yes, to restate this in a different way - the only way that Lustre
> itself
> knows that some client will NOT be participating is after the
> timeout has
> expired. If there is some external mechanism that can inform Lustre
> that
> one or more clients are dead and will not be participating in recovery
> then the recovery does not need to wait for the timeout.
The external mechanism should just evict the known dead clients from
the server as soon as it discovers them so the server can begin
recovery as soon as the live clients connect. Then we don't need to
worry about the timeout.
robert
More information about the lustre-devel
mailing list