[Lustre-discuss] Question about failnode

Thu Oct 25 07:29:14 PDT 2007

On Thu, 2007-10-25 at 20:58 +0900, Kazuki Ohara wrote:
> Hi,
> I have a question about the fainode directive of mkfs.lustre.
> I hope someone help me.
> 
> When a shared volume is formatted and mounted as below,
>   [root at ossnode1 ~]# mkfs.lustre --ost --failnode=ossnode2 \
>   > --mgsnode=mgsnode at tcp /dev/sda1
>   [root at ossnode1 ~]# mount -t lustre /dev/sda1 /mnt/ost1
> ossnode1 knows that sda1 can be accessed by ossnode2.
> 
> Then, a failover occurs and ossnode2 mounts sda1,

ossnode2 should in fact, before it does the mount, make as entirely sure
as it can that ossnode1 does not have it mounted.  The surefire way to
do that is to kill the power to ossnode1 (assuming there is a power
controller between the mains and ossnode1 that ossnode2 can operate).

In the failover game this is called STONITH and is an acronym for "Shoot
The Other Node In The Head".  All of this is usually coordinated with
something like Heartbeat.

The reason for this STONITH action is that in an HA scenario, ossnode2
only knows that it cannot reach ossnode1.  It does not know why.  It
could be because it's power failed, it panic'd or any number of reasons.
Not all of those reasons imply that ossnode1 cannot (and does not) still
have the disk mounted though.  Only by killing ossnode1 itself, can
ossnode2 be absolutely sure that ossnode1 does not have the disk
mounted.

More than one node mounting an ext{2,3,4} or ldiskfs (which is ext4,
basically) filesystem is disastrous for that filesystem, so all possible
measures necessary to prevent that need to be taken.

> I think there is no way for ossnode2 to know sda1 can be accessed by ossnode1.
> Does this become a problem?

It does, hence the steps above.

b.