[Lustre-discuss] Question on setting up fail-over

David Noriega tsk133 at my.utsa.edu
Tue Aug 10 09:03:06 PDT 2010


I think I'll go the ipmi route. So reading on STONITH, its just a
script, so all I would need is a script to run ipmi that tells the
server to power off, right?

Also while reading through the lustre manual, seems some things are
being deleted from the wiki,
http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
and noticed this too when I found the lustre quick guide is no longer
available.

Thanks
David

On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
<kevin.van.maren at oracle.com> wrote:
> David Noriega wrote:
>>
>> Could you describe this resource fencing in more detail? As for
>> regards to STONITH, the pdu already has the grubby hands of IT plugged
>> into it and doubt they would be happy if I unplugged them.  What about
>> the network management port or ILOM?
>>
>
> Resource fencing is needed to ensure that a node does not take over a
> resource (ie, OST)
> while the other node is still accessing it (as could happen if the node only
> partly crashes,
> where it is not responding to the HA package but still writing to the disk).
>
> STONITH is a pretty common way to ensure the other node is dead and can no
> longer
> access the resource.  If you can't use your switched PDU, then using the
> ILOM for IPMI-based
> power control works.  The other common way to do resource fencing is to use
> scsi reserve
> commands (if supported by the hardware and the HA package) to ensure
> exclusive access.
>
> Kevin
>
>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>> <Kevin.Van.Maren at oracle.com> wrote:
>>
>>>
>>> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at my.utsa.edu> wrote:
>>>
>>>
>>>>
>>>> My understanding of setting up fail-over is you need some control over
>>>> the power so with a script it can turn off a machine by cutting its
>>>> power? Is this correct?
>>>>
>>>
>>> It is the recommended configuration because it is simple to understand
>>> and
>>> implement.
>>>
>>> But the only _hard_ requirement is that both nodes can access the
>>> storage.
>>>
>>>
>>>
>>>>
>>>> Is there a way to do fail-over without having
>>>> access to the pdu(power strips)?
>>>>
>>>
>>> If you have IPMI support, that can be used for power control, instead of
>>> a
>>> switched PDU.  Depending on the storage, you may be able to do resource
>>> fencing of the disks instead of STONITH.  Or you can run fast-and-loose,
>>> without any way to ensure the dead node is really "dead" and not
>>> accessing
>>> storage (at your risk).  While Lustre has MMP, it is really more to
>>> protect
>>> against a mount typo than to guarantee resource fencing.
>>>
>>>
>>>
>>>>
>>>> Thanks
>>>> David
>>>>
>>>> --
>>>> Personally, I liked the university. They gave us money and facilities,
>>>> we didn't have to produce anything! You've never been out of college!
>>>> You don't know what it's like out there! I've worked in the private
>>>> sector. They expect results. -Ray Ghostbusters
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>
>>
>>
>>
>
>



-- 
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters



More information about the lustre-discuss mailing list