[Lustre-discuss] Redhat cluster failover

Giacomo Montagner gmontagner at sorint.it
Fri Jun 26 06:24:35 PDT 2009


Hi Daire, 
it seems good, did you try it? You might as well check for some 

/proc/fs/lustre/obdfilter/<OST> 

entry, to see if the OST is mounted and working well. 

Bye, 
Giacomo


On Wed, 2009-06-24 at 11:45 +0100, Daire.Byrne at framestore.com wrote:
> Giacomo,
> 
> I had not considered using RHCS's mount filesystem plugin "fs.sh". I was thinking of just using the "script" plugin with mount/umount commands in it. As far as I can tell the main advantage of this is that it is trivial to add checks to the "status" return to notify RHCS when an OST has had a failure (e.g. /proc/fs/lustre/health_check). I have included a quick proof of concept (untested).
> 
> My idea is to create symlinks to this script named after the OST devices (e.g. delta-OST0000 -> lustre.init) and then add them as script services in RHCS. Are there more rigorous checks that people do to check the health of a lustre mount other than just checking /proc/fs/lustre/health_check ?
> 
> Daire
> 
> ----- "Giacomo Montagner" <gmontagner at sorint.it> wrote:
> 
> > On Tue, 2009-06-23 at 12:52 +0100, Daire.Byrne at framestore.com wrote:
> > > Hi,
> > > 
> > > I know that heartbeat is the preferred failover application for
> > Lustre but I want to evaluate Redhat's cluster suite again. It used to
> > be pretty ropey in the RHEL4 days but I'm led to believe it is much
> > improved in RHEL5. I was wondering if anyone is currently using this
> > with Lustre and if so could you share your init.d script to help get
> > me started? Any other advice or thoughts gratefully accepted.
> > > 
> > > Regards,
> > > 
> > > Daire 
> > 
> > Hi! 
> > I'm using RHCS on RHEL 5.3 in a test environment (VMware virtual 
> > machines, nothing special) to failover an MGS, an MDT and four OST's 
> > over 2 VM. It works pretty well, I only needed to modify the original
> > 
> > fs.sh resource agent script and disable almost every check - the only
> > 
> > surviving check, by now, is "it's mounted/it's not mounted". I would 
> > like to rewrite the RA script to make it work better (with some 
> > effective check to see if a target is really working as it should) but
> > I
> > hadn't time yet. I attach the RA script. It's ugly, and maybe some 
> > comment is completely nonsense or out-of-place. And perhaps my English
> > 
> > gets often funny (let's say funny). 
> > I'm using LVM-HA to ensure no device gets mounted twice, but it should
> > 
> > be an unbearable overhead in a true production environment (I think).
> > 
> > Maye the lustre MMP is enough.
> > 
> > Bye!
> > Giacomo
> > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss






More information about the lustre-discuss mailing list