[Lustre-discuss] failover software - heartbeat (Lundgren, Andrew)

Daniel Kulinski dank at weinmangeoscience.com
Mon Jul 13 13:50:58 PDT 2009


Andrew,

I was able to get the ipfail to work on my heartbeat 2.1.3 installation.  

Make sure the following line is uncommented in /etc/ha.d/ha.cf:
respawn hacluster /usr/lib64/heartbeat/ipfail

And corresponding with that you must have a ping line with each host
separated by a space.

We have tested this and it works perfectly.  We have 3 ethernet networks to
each OSS and MDS pair.

I have no idea on what pingd is or how it relates to heartbeat.

Dan Kulinski

>
>Were you able to get monitoring working to detect network failures?
(pingd?)
>
>I have it configured, but haven't been able to get it to trigger a failover
when an MDS cannot ping the network.  (I tried with 1.0 and 2.0 conf files,
I am currently >using 2.0)  I have a ticket open with the pacemaker project
(no ticket system for the HA stuff...)
>but not resolution.  I am considering writing a script to down the node
when the ping fails, but don't like the idea.  
>
>I would also like to get the hpingd functioning to detect a fiber failure,
but there was less available on that solution.
>
>--
>Andrew





More information about the lustre-discuss mailing list