[Lustre-discuss] Filesystem monitoring in Heartbeat

Adam Gandelman adam.gandelman at linbit.com
Thu Jan 21 10:57:20 PST 2010


Jagga Soorma wrote:
> Hi Guys,
>
> My MDT is setup with LVM and I was able to test failover based on the
> Volume Group failing on my MDS (by unplugging both fibre cables). 
> However, for my OST's, I have created filesystems directly on the SAN
> luns and when I unplug the fibre cables on my OSS, heartbeat does not
> detect failure for the filesystem since it shows as mounted.  Is there
> somehow we can trigger a failure based on multipath failing on the OSS?
>

Hi-

It would depend on the version of heartbeat you are using.  Heartbeat v1
did not do any resource level monitoring and if that is what you are
using you are out of luck. 

If using v2 CRM and/or Pacemaker, you have two options:

1, Modify the Filesystem OCF script's monitor operation to check the
actual health of  the filesystem and/or multipath in addition to the
status of the mount and return accordingly.   The Filesystem OCF agent
is located at /usr/lib/ocf/resource.d/heartbeat/Filesystem
2, Create your own resource agent that interacts with dm/multipath to
start/stop/monitor it.  Then constrain the resource to start before/stop
after and run with the Filesystem resource.  Then the filesystem will be
dependent on the health of the multipath resource.

I recommend the second for the sake of thoroughness.  Including
multipath monitoring in the Filesystem OCF may "just work" but leaves
room for other multipath related failures going unnoticed. Writing your
own OCF is fairly straight forward and is documented somewhere on
www.clusterlabs.org.   There is an OCF script that does the same for LVM
which would serve as a good example of what needs to be done.  Or maybe
someone else has already created one?  Linux-HA or Pacemaker lists might
be a good place to ask.

 
Good luck

-- 
: Adam Gandelman
: LINBIT | Your Way to High Availability
:
: http://www.linbit.com 	  	




More information about the lustre-discuss mailing list