[Lustre-discuss] failover software - heartbeat

Lundgren, Andrew Andrew.Lundgren at Level3.com
Mon Jul 13 13:25:17 PDT 2009


Were you able to get monitoring working to detect network failures?  (pingd?)

I have it configured, but haven't been able to get it to trigger a failover when an MDS cannot ping the network.  (I tried with 1.0 and 2.0 conf files,  I am currently using 2.0)  I have a ticket open with the pacemaker project (no ticket system for the HA stuff...)
but not resolution.  I am considering writing a script to down the node when the ping fails, but don't like the idea.  

I would also like to get the hpingd functioning to detect a fiber failure, but there was less available on that solution.

--
Andrew

> -----Original Message-----
> From: Jim Garlick [mailto:garlick at llnl.gov]
> Sent: Monday, July 13, 2009 2:21 PM
> To: Lundgren, Andrew
> Cc: Carlos Santana; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] failover software - heartbeat
> 
> We recently put heartbeat v1 in production and along the way
> developed some admin scripts including heartbeat resource agent
> compliant
> lustre init scripts, a script to initiate failover/failback and get
> detailed
> status, a powerman stonith interface, and various safeguards to ensure
> MMP
> is on, devices are present and usable, etc. before starting lustre.
> 
> If this is of general interest I could post it to a bug for review.
> 
> Jim
> 
> On Mon, Jul 13, 2009 at 01:45:02PM -0600, Lundgren, Andrew wrote:
> > It is very difficult to find relevant documentation for heartbeat
> 1/2. I just finished configuring a heartbeat system and would not
> recommend it because of the documentation.  (They seem to have removed
> portions the heartbeat documentation from the site.)
> >
> > Pacemaker is not a simple solution to configure either. I played
> briefly with the RH clustering software.  It does not directly support
> any FS type other than the basic ext2/ext3, and wasn't happy with a
> lustre type.
> >
> > --
> > Andrew
> >
> > > -----Original Message-----
> > > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-
> discuss-
> > > bounces at lists.lustre.org] On Behalf Of Carlos Santana
> > > Sent: Monday, July 13, 2009 11:42 AM
> > > To: lustre-discuss at lists.lustre.org
> > > Subject: [Lustre-discuss] failover software - heartbeat
> > >
> > > Howdy,
> > >
> > > The lustre manual recommends heartbeat for handling failover. The
> > > pacemaker is successor of hearbeat version 2. So whats recommended
> -
> > > should we be using pacemaker or stick to hearbeat?
> > >
> > > -
> > > CS.
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://*lists.lustre.org/mailman/listinfo/lustre-discuss
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://*lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list