[Lustre-discuss] Question on setting up fail-over

David Noriega tsk133 at my.utsa.edu
Tue Aug 10 12:15:14 PDT 2010


So your script resets the server so there is no fail-over(ie the other
server takes over resources from that server?) or there is failover
but you then manually return resources back to the server that was
reset?

On Tue, Aug 10, 2010 at 1:39 PM, Bernd Schubert
<bs_lists at aakef.fastmail.fm> wrote:
> On Tuesday, August 10, 2010, Kevin Van Maren wrote:
>> Depends on the HA package you are using.  Heartbeat comes with a script
>> that supports IPMI.
>>
>
> For our installations we even use a modified external/ipmi_ddn stonith script
> that does uses power-off/status/on to make sure the system is really reset.
> The heartbeat/pacemaker script uses the ipmi reset method by default, but ipmi
> commands are not required by specs to succeed. So ipmitool (used by
> external/ipmi) might successfully return, but does in way ensure the node was
> really reset. I have seen that rather often in real life already.
> The default script also supports the power-off/on method, but also does not
> check for the status.
>
> So our modified script first powers off, then checks if the node is really
> offline, then powers on again and only then successfully returns.
> Unfortunately, that is at the cost of an increased fail-over time, as power-
> off and then power-on needs some minimal downtime in between (ca. 30s) and
> heartbeats/pacemaker stonith does not support async events (power-off would be
> sufficient, but once stonith successfully returns, it is not called again till
> the next fencing).
>
> --
> Bernd Schubert
> DataDirect Networks
>



-- 
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters



More information about the lustre-discuss mailing list