[Lustre-discuss] Question on setting up fail-over

Kevin Van Maren kevin.van.maren at oracle.com
Tue Aug 10 09:11:54 PDT 2010


Depends on the HA package you are using.  Heartbeat comes with a script 
that supports IPMI.

The important thing is that stonith NOT succeed if you don't _know_ that 
the node is off.
So it is absolutely not a 1-line script.

Kevin


David Noriega wrote:
> I think I'll go the ipmi route. So reading on STONITH, its just a
> script, so all I would need is a script to run ipmi that tells the
> server to power off, right?
>
> Also while reading through the lustre manual, seems some things are
> being deleted from the wiki,
> http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
> and noticed this too when I found the lustre quick guide is no longer
> available.
>
> Thanks
> David
>
> On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
> <kevin.van.maren at oracle.com> wrote:
>   
>> David Noriega wrote:
>>     
>>> Could you describe this resource fencing in more detail? As for
>>> regards to STONITH, the pdu already has the grubby hands of IT plugged
>>> into it and doubt they would be happy if I unplugged them.  What about
>>> the network management port or ILOM?
>>>
>>>       
>> Resource fencing is needed to ensure that a node does not take over a
>> resource (ie, OST)
>> while the other node is still accessing it (as could happen if the node only
>> partly crashes,
>> where it is not responding to the HA package but still writing to the disk).
>>
>> STONITH is a pretty common way to ensure the other node is dead and can no
>> longer
>> access the resource.  If you can't use your switched PDU, then using the
>> ILOM for IPMI-based
>> power control works.  The other common way to do resource fencing is to use
>> scsi reserve
>> commands (if supported by the hardware and the HA package) to ensure
>> exclusive access.
>>
>> Kevin
>>
>>     
>>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>>> <Kevin.Van.Maren at oracle.com> wrote:
>>>
>>>       
>>>> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at my.utsa.edu> wrote:
>>>>
>>>>
>>>>         
>>>>> My understanding of setting up fail-over is you need some control over
>>>>> the power so with a script it can turn off a machine by cutting its
>>>>> power? Is this correct?
>>>>>
>>>>>           
>>>> It is the recommended configuration because it is simple to understand
>>>> and
>>>> implement.
>>>>
>>>> But the only _hard_ requirement is that both nodes can access the
>>>> storage.
>>>>
>>>>
>>>>
>>>>         
>>>>> Is there a way to do fail-over without having
>>>>> access to the pdu(power strips)?
>>>>>
>>>>>           
>>>> If you have IPMI support, that can be used for power control, instead of
>>>> a
>>>> switched PDU.  Depending on the storage, you may be able to do resource
>>>> fencing of the disks instead of STONITH.  Or you can run fast-and-loose,
>>>> without any way to ensure the dead node is really "dead" and not
>>>> accessing
>>>> storage (at your risk).  While Lustre has MMP, it is really more to
>>>> protect
>>>> against a mount typo than to guarantee resource fencing.
>>>>
>>>>
>>>>
>>>>         
>>>>> Thanks
>>>>> David
>>>>>
>>>>> --
>>>>> Personally, I liked the university. They gave us money and facilities,
>>>>> we didn't have to produce anything! You've never been out of college!
>>>>> You don't know what it's like out there! I've worked in the private
>>>>> sector. They expect results. -Ray Ghostbusters
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>>           
>>>
>>>
>>>       
>>     
>
>
>
>   




More information about the lustre-discuss mailing list