[Lustre-discuss] Has anyone had experience with heartbeat and drdbd providing full redundancy on lustre clusters

Thu Apr 23 22:59:39 PDT 2009

On Donnerstag, 23. April 2009 14:57:34 Thomas Roth wrote:

We are using heartbeat (including stonith) and drbd for our MDT/MGS system. 
The procedure and setup has been posted to this list.
The system works fine in our scenario and we haven't encountered any problems 
so far. Total Storage capacity is 110TB (51% full) with 8 OSS Raids.
Heartbeat is a beast when it comes to configure all possible events. This might 
be your problem that you have to tell heartbeat _explicitly_ what to do when 
it recovers from _certain_ failover. There are some values to look for in the 
config file.

Heiko

> Hi,
>
> we are using Heartbeat+DRBD on our MGS/MDT.
> DRBD work fine, we have been using it for backing up the MDT, upgrading
> the Lustre version, and of course for failover.
> Heartbeat proves to be much trickier. Since our network is rather shaky,
> we are suffering from late heartbeats, Heartbeat trying to fail over for
> no apparent reason, Stonith for no good reason... In addition, if one
> umounts the MDT, it always starts with a delay of 330sec, and Hearbeat
> always gives up on the resource MDT after 20000ms - no matter what I put
> into the cib.xml, no matter the Lustre timeouts. So one always needs to
> force the umount or more likely a reboot - not a problem with a reliable
> Stonith-procedure ;-)
>
> Thomas
>
> Christopher Deneen wrote:
> > trying to get a feel if it's worth investing time to implement.
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss