[Lustre-discuss] Redhat cluster failover

Daire.Byrne at framestore.com Daire.Byrne at framestore.com
Wed Jun 24 03:45:57 PDT 2009


Giacomo,

I had not considered using RHCS's mount filesystem plugin "fs.sh". I was thinking of just using the "script" plugin with mount/umount commands in it. As far as I can tell the main advantage of this is that it is trivial to add checks to the "status" return to notify RHCS when an OST has had a failure (e.g. /proc/fs/lustre/health_check). I have included a quick proof of concept (untested).

My idea is to create symlinks to this script named after the OST devices (e.g. delta-OST0000 -> lustre.init) and then add them as script services in RHCS. Are there more rigorous checks that people do to check the health of a lustre mount other than just checking /proc/fs/lustre/health_check ?

Daire

----- "Giacomo Montagner" <gmontagner at sorint.it> wrote:

> On Tue, 2009-06-23 at 12:52 +0100, Daire.Byrne at framestore.com wrote:
> > Hi,
> > 
> > I know that heartbeat is the preferred failover application for
> Lustre but I want to evaluate Redhat's cluster suite again. It used to
> be pretty ropey in the RHEL4 days but I'm led to believe it is much
> improved in RHEL5. I was wondering if anyone is currently using this
> with Lustre and if so could you share your init.d script to help get
> me started? Any other advice or thoughts gratefully accepted.
> > 
> > Regards,
> > 
> > Daire 
> 
> Hi! 
> I'm using RHCS on RHEL 5.3 in a test environment (VMware virtual 
> machines, nothing special) to failover an MGS, an MDT and four OST's 
> over 2 VM. It works pretty well, I only needed to modify the original
> 
> fs.sh resource agent script and disable almost every check - the only
> 
> surviving check, by now, is "it's mounted/it's not mounted". I would 
> like to rewrite the RA script to make it work better (with some 
> effective check to see if a target is really working as it should) but
> I
> hadn't time yet. I attach the RA script. It's ugly, and maybe some 
> comment is completely nonsense or out-of-place. And perhaps my English
> 
> gets often funny (let's say funny). 
> I'm using LVM-HA to ensure no device gets mounted twice, but it should
> 
> be an unbearable overhead in a true production environment (I think).
> 
> Maye the lustre MMP is enough.
> 
> Bye!
> Giacomo
> 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> -- 
> Giacomo Montagner
> Senior System Engineer & RHCE > SORINT.LAB S.R.L.
> (http://www.sorintlab.com/)
> ______________________________
> Mobile: +39.335.1294989
> e-mail: giacomo.montagner at SORINT.it
> 
> === Please consider the environment before printing this email ===
> 
> PERSONALE E CONFIDENZIALE.
> Questa mail potrebbe includere materiale confidenziale, proprietario
> o
> altrimenti privato per l'uso esclusivo del destinatario.
> Se l'avete ricevuto per errore, siete pregati di contattare chi ha
> inviato il messaggio e di cancellarne tutte le copie.
> Ogni altro uso da parte vostra del messaggio e' proibito.
> 
> PERSONAL AND CONFIDENTIAL.
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information.
> If you have received it in error, please notify the sender
> immediately
> and delete all the copies.
> Any other use of the email by you is prohibited.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre.init
Type: application/octet-stream
Size: 1504 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090624/90ea3677/attachment.obj>


More information about the lustre-discuss mailing list