[Lustre-discuss] About MDS failover

Jeffrey Alan Bennett jab at sdsc.edu
Thu Jan 15 11:38:32 PST 2009

Hi Cliff,

> Are you using heartbeat V1 or V2?
I am using heartbeat V2. It works as expected, I just had to tune some time outs, but it still takes around 3 minutes to totally move the MGS/MDS services to the other system. I guess having the MGS and MDS on separate systems would help reduce this time. Also, MMP is affecting somehow to this time, but MMP is necessary for failover.

My biggest concern is that I can't control the situation in which the HBA connectivity with the storage system is damaged, ie: I pull the cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS services keep running, they are still mounted and therefore heartbeat does nothing. From the heartbeat "documentation" it does not seem that this can be done, at least easily?. I read something about HBA ping and it seems it requires HBAAPI which does not work with Brocade HBAs...

Any help will be greatly appreciated.

> I would like to hear more about the issues you are experiencing.
> We have had some people use the Red Hat cluster tools.
I will try Red Hat cluster tools.



