[Lustre-devel] Imperative Recovery - forcing failover server stop blocking

Andreas Dilger adilger at sun.com
Tue Jun 23 07:53:16 PDT 2009


On Jun 23, 2009  13:49 +0100, Eric Barton wrote:
> Yes, of course, you can just tune down the recovery window in the
> knowledge that explicit notification has speeded the whole process of
> client reconnection.  However if you have better knowledge about
> client health than Lustre can have - e.g. hardware-specific health
> monitoring, or just using the success/failure of the explicit
> notification method itself - then why not use it to control exactly
> when to stop waiting for dead clients?

Yes, to restate this in a different way - the only way that Lustre itself
knows that some client will NOT be participating is after the timeout has
expired.  If there is some external mechanism that can inform Lustre that
one or more clients are dead and will not be participating in recovery
then the recovery does not need to wait for the timeout.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-devel mailing list