[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

Nicolas Williams Nicolas.Williams at sun.com
Mon Jan 11 11:10:58 PST 2010


On Mon, Jan 11, 2010 at 07:32:24AM -0600, Nochum Klein wrote:
> Thanks, I appreciate your help.  I apologize for not being clearer about my
> intent.  The goal is to implement a fault-tolerant pair of application
> component instances in a hot-warm configuration where the primary and
> secondary instances exchange heartbeats to determine the health/availability
> of their partner.  The partners would be on different nodes.  While the
> primary instance is running it should have exclusive write access to the
> application state.  If the configured number of heratbeats are missed, the
> secondary component instance will try to retrieve the lock on
> the application state (thereby becoming primary).  Given that networks are
> often unreliable the design goal is that the clustered file system should
> ensure that the secondary instance does not assume primary role while the
> actual primary instance is still alive when a network disruption has
> occurred.  So in a sense a controlled pingpong is actually the desired
> effect (where the secondary and primary instances change roles whenever the
> current primary instance fails).  Am I correct that the configuration
> referenced below could support this behavior?

If the client running the primary dies, eventually it will be evicted
from the cluster, its locks will be dropped, and the secondary will be
able to take over.

If the application running on the primary hangs while holding the lock,
then the secondary will not be able to take over.

I would recommend implementing your own lock system.  A simple lockfile,
opened with O_EXCL|O_CREAT should suffice.

Nico
-- 



More information about the lustre-discuss mailing list