[Lustre-discuss] failover software - heartbeat

Lundgren, Andrew Andrew.Lundgren at Level3.com
Mon Jul 13 13:41:09 PDT 2009


Are you doing anything if the network fails to one mds?

How about if your fiber path fails?

> -----Original Message-----
> From: Jim Garlick [mailto:garlick at llnl.gov]
> Sent: Monday, July 13, 2009 2:39 PM
> To: Lundgren, Andrew
> Cc: Carlos Santana; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] failover software - heartbeat
> 
> No.  I originally did have it set up like this (a v1 ha.cf snippet):
> 
> # One partner losing contact with both lnet routers or MDS triggers
> failover.
> #ping_group lnet-router 172.16.10.254 172.16.2.254
> #ping_group tycho-mds1 172.16.10.200 172.16.2.200
> #respawn hacluster /usr/lib64/heartbeat/ipfail
> 
> However, I ran into a problem when rebooting the MDS.  Apparently if
> one
> partner re-establishes contact with the MDS before the other one, it
> immediately triggers failover.  This is with heartbeat-2.1.4.
> 
> Jim
> 
> On Mon, Jul 13, 2009 at 02:25:17PM -0600, Lundgren, Andrew wrote:
> > Were you able to get monitoring working to detect network failures?
> (pingd?)
> >
> > I have it configured, but haven't been able to get it to trigger a
> failover when an MDS cannot ping the network.  (I tried with 1.0 and
> 2.0 conf files,  I am currently using 2.0)  I have a ticket open with
> the pacemaker project (no ticket system for the HA stuff...)
> > but not resolution.  I am considering writing a script to down the
> node when the ping fails, but don't like the idea.
> >
> > I would also like to get the hpingd functioning to detect a fiber
> failure, but there was less available on that solution.
> >
> > --
> > Andrew
> >
> > > -----Original Message-----
> > > From: Jim Garlick [mailto:garlick at llnl.gov]
> > > Sent: Monday, July 13, 2009 2:21 PM
> > > To: Lundgren, Andrew
> > > Cc: Carlos Santana; lustre-discuss at lists.lustre.org
> > > Subject: Re: [Lustre-discuss] failover software - heartbeat
> > >
> > > We recently put heartbeat v1 in production and along the way
> > > developed some admin scripts including heartbeat resource agent
> > > compliant
> > > lustre init scripts, a script to initiate failover/failback and get
> > > detailed
> > > status, a powerman stonith interface, and various safeguards to
> ensure
> > > MMP
> > > is on, devices are present and usable, etc. before starting lustre.
> > >
> > > If this is of general interest I could post it to a bug for review.
> > >
> > > Jim
> > >
> > > On Mon, Jul 13, 2009 at 01:45:02PM -0600, Lundgren, Andrew wrote:
> > > > It is very difficult to find relevant documentation for heartbeat
> > > 1/2. I just finished configuring a heartbeat system and would not
> > > recommend it because of the documentation.  (They seem to have
> removed
> > > portions the heartbeat documentation from the site.)
> > > >
> > > > Pacemaker is not a simple solution to configure either. I played
> > > briefly with the RH clustering software.  It does not directly
> support
> > > any FS type other than the basic ext2/ext3, and wasn't happy with a
> > > lustre type.
> > > >
> > > > --
> > > > Andrew
> > > >
> > > > > -----Original Message-----
> > > > > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-
> > > discuss-
> > > > > bounces at lists.lustre.org] On Behalf Of Carlos Santana
> > > > > Sent: Monday, July 13, 2009 11:42 AM
> > > > > To: lustre-discuss at lists.lustre.org
> > > > > Subject: [Lustre-discuss] failover software - heartbeat
> > > > >
> > > > > Howdy,
> > > > >
> > > > > The lustre manual recommends heartbeat for handling failover.
> The
> > > > > pacemaker is successor of hearbeat version 2. So whats
> recommended
> > > -
> > > > > should we be using pacemaker or stick to hearbeat?
> > > > >
> > > > > -
> > > > > CS.
> > > > > _______________________________________________
> > > > > Lustre-discuss mailing list
> > > > > Lustre-discuss at lists.lustre.org
> > > > > http://**lists.lustre.org/mailman/listinfo/lustre-discuss
> > > > _______________________________________________
> > > > Lustre-discuss mailing list
> > > > Lustre-discuss at lists.lustre.org
> > > > http://**lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list