[Lustre-discuss] Heartbeat Failover Issues on MDS

Adam Gandelman adam.gandelman at linbit.com
Tue Jan 19 15:25:43 PST 2010


Jagga Soorma wrote:
> Hi Guys,
>
> I am setting up a heartbeat cluster for my 2 MDS servers.  However, I
> am running into the following issue.  If I power off the passive node
> and heartbeat uncleanly shuts down, then after the server is brought
> back online and the heartbeat services are started, all my resource
> are shutdown eventhough they are running on the active node and then
> brought back online automatically.  Am I missing some settings here? 
> Stickiness?  I have been unable to get this to work. 
Without logs its hard to say, but it sounds like it may be a
resource-stickiness issue. Setting a default resource stickiness of
something high like 1000 or 2000 will usually keep resources stuck to a
node until you tell it to move (with a higher score/INFINITY).  Also,
make sure  no services that heartbeat manages are started at boot.  This
includes making sure your MDS and OSS filesystems are not in /etc/fstab.

Good luck,

-- 
: Adam Gandelman
: LINBIT | Your Way to High Availability
: Sales: 1-877-4-LINBIT / 1-877-454-6248
:
: 7959 SW Cirrus Dr.
: Beaverton, OR 97008
:
: http://www.linbit.com 	  	




More information about the lustre-discuss mailing list