[Lustre-discuss] failover software - heartbeat
Lundgren, Andrew
Andrew.Lundgren at Level3.com
Mon Jul 13 13:25:17 PDT 2009
Were you able to get monitoring working to detect network failures? (pingd?)
I have it configured, but haven't been able to get it to trigger a failover when an MDS cannot ping the network. (I tried with 1.0 and 2.0 conf files, I am currently using 2.0) I have a ticket open with the pacemaker project (no ticket system for the HA stuff...)
but not resolution. I am considering writing a script to down the node when the ping fails, but don't like the idea.
I would also like to get the hpingd functioning to detect a fiber failure, but there was less available on that solution.
--
Andrew
> -----Original Message-----
> From: Jim Garlick [mailto:garlick at llnl.gov]
> Sent: Monday, July 13, 2009 2:21 PM
> To: Lundgren, Andrew
> Cc: Carlos Santana; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] failover software - heartbeat
>
> We recently put heartbeat v1 in production and along the way
> developed some admin scripts including heartbeat resource agent
> compliant
> lustre init scripts, a script to initiate failover/failback and get
> detailed
> status, a powerman stonith interface, and various safeguards to ensure
> MMP
> is on, devices are present and usable, etc. before starting lustre.
>
> If this is of general interest I could post it to a bug for review.
>
> Jim
>
> On Mon, Jul 13, 2009 at 01:45:02PM -0600, Lundgren, Andrew wrote:
> > It is very difficult to find relevant documentation for heartbeat
> 1/2. I just finished configuring a heartbeat system and would not
> recommend it because of the documentation. (They seem to have removed
> portions the heartbeat documentation from the site.)
> >
> > Pacemaker is not a simple solution to configure either. I played
> briefly with the RH clustering software. It does not directly support
> any FS type other than the basic ext2/ext3, and wasn't happy with a
> lustre type.
> >
> > --
> > Andrew
> >
> > > -----Original Message-----
> > > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-
> discuss-
> > > bounces at lists.lustre.org] On Behalf Of Carlos Santana
> > > Sent: Monday, July 13, 2009 11:42 AM
> > > To: lustre-discuss at lists.lustre.org
> > > Subject: [Lustre-discuss] failover software - heartbeat
> > >
> > > Howdy,
> > >
> > > The lustre manual recommends heartbeat for handling failover. The
> > > pacemaker is successor of hearbeat version 2. So whats recommended
> -
> > > should we be using pacemaker or stick to hearbeat?
> > >
> > > -
> > > CS.
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://*lists.lustre.org/mailman/listinfo/lustre-discuss
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://*lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list