[Lustre-discuss] About MDS failover

Andreas Dilger adilger at sun.com
Thu Jan 15 17:19:07 PST 2009

On Jan 15, 2009  11:38 -0800, Jeffrey Alan Bennett wrote:
> I am using heartbeat V2. It works as expected, I just had to tune some
> time outs, but it still takes around 3 minutes to totally move the MGS/MDS
> services to the other system.

This is largely an issue of the Lustre failover itself, and not the HA
software.  The problem today is that under heavy load the clients may
have to wait a long time for any requests sent to the server to complete
(100s of seconds in some cases), so it is difficult for the clients to
distinguish between server death (unlikely) and heavy server load (common).

In the case where a server dies and fails over, the clients have to wait
for their requests to time out, then they resend and wait again (in the
common case the server is just overloaded), then finally they try to contact
any other server listed as failover for that node.

What we are looking to do for improving failover speed is to have the
backup server broadcast to the clients that it has taken over the OST/MDT
when it has started.  Then the clients will be able to do failover to
the new server as soon as it is ready, instead of waiting for the original
requests to time out.

> My biggest concern is that I can't control the situation in which
> the HBA connectivity with the storage system is damaged, ie: I pull the
> cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS
> services keep running, they are still mounted and therefore heartbeat
> does nothing. From the heartbeat "documentation" it does not seem that
> this can be done, at least easily?. I read something about HBA ping and
> it seems it requires HBAAPI which does not work with Brocade HBAs...

You can use HBA multi-pathing to avoid this problem, if your hardware
supports it.  You can also use /proc/fs/lustre/health_check to check
if the filesystems have encountered errors and are marked "unhealthy".

Cheers, Andreas
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

More information about the lustre-discuss mailing list