[Lustre-discuss] Question on setting up fail-over

laotsao 老曹 laotsao at gmail.com
Tue Aug 10 09:44:56 PDT 2010



On 8/10/2010 12:20 PM, David Noriega wrote:
> Another question. Is it possible to use centos/redhat's clustering
> software?
main issues, IMHO, are that lustre today use the physical hostname/ip 
for all MDS, OSS, MGS etc
cluster SW use the VIP, so there are some work need to be done to make 
VIP work for lustre
my 2c

> In the manual it mentions using that for metadata
> failover(since having more then one metadata server online isnt
> possible right now), so why not use that for all of lustre? But since
> the information is missing, can someone fill in the blanks on setting
> up metadata failover?
>
> David
>
> On Tue, Aug 10, 2010 at 11:11 AM, Kevin Van Maren
> <kevin.van.maren at oracle.com>  wrote:
>> Depends on the HA package you are using.  Heartbeat comes with a script that
>> supports IPMI.
>>
>> The important thing is that stonith NOT succeed if you don't _know_ that the
>> node is off.
>> So it is absolutely not a 1-line script.
>>
>> Kevin
>>
>>
>> David Noriega wrote:
>>> I think I'll go the ipmi route. So reading on STONITH, its just a
>>> script, so all I would need is a script to run ipmi that tells the
>>> server to power off, right?
>>>
>>> Also while reading through the lustre manual, seems some things are
>>> being deleted from the wiki,
>>> http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
>>> and noticed this too when I found the lustre quick guide is no longer
>>> available.
>>>
>>> Thanks
>>> David
>>>
>>> On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
>>> <kevin.van.maren at oracle.com>  wrote:
>>>
>>>> David Noriega wrote:
>>>>
>>>>> Could you describe this resource fencing in more detail? As for
>>>>> regards to STONITH, the pdu already has the grubby hands of IT plugged
>>>>> into it and doubt they would be happy if I unplugged them.  What about
>>>>> the network management port or ILOM?
>>>>>
>>>>>
>>>> Resource fencing is needed to ensure that a node does not take over a
>>>> resource (ie, OST)
>>>> while the other node is still accessing it (as could happen if the node
>>>> only
>>>> partly crashes,
>>>> where it is not responding to the HA package but still writing to the
>>>> disk).
>>>>
>>>> STONITH is a pretty common way to ensure the other node is dead and can
>>>> no
>>>> longer
>>>> access the resource.  If you can't use your switched PDU, then using the
>>>> ILOM for IPMI-based
>>>> power control works.  The other common way to do resource fencing is to
>>>> use
>>>> scsi reserve
>>>> commands (if supported by the hardware and the HA package) to ensure
>>>> exclusive access.
>>>>
>>>> Kevin
>>>>
>>>>
>>>>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>>>>> <Kevin.Van.Maren at oracle.com>  wrote:
>>>>>
>>>>>
>>>>>> On Aug 9, 2010, at 11:45 AM, David Noriega<tsk133 at my.utsa.edu>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My understanding of setting up fail-over is you need some control over
>>>>>>> the power so with a script it can turn off a machine by cutting its
>>>>>>> power? Is this correct?
>>>>>>>
>>>>>>>
>>>>>> It is the recommended configuration because it is simple to understand
>>>>>> and
>>>>>> implement.
>>>>>>
>>>>>> But the only _hard_ requirement is that both nodes can access the
>>>>>> storage.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Is there a way to do fail-over without having
>>>>>>> access to the pdu(power strips)?
>>>>>>>
>>>>>>>
>>>>>> If you have IPMI support, that can be used for power control, instead
>>>>>> of
>>>>>> a
>>>>>> switched PDU.  Depending on the storage, you may be able to do resource
>>>>>> fencing of the disks instead of STONITH.  Or you can run
>>>>>> fast-and-loose,
>>>>>> without any way to ensure the dead node is really "dead" and not
>>>>>> accessing
>>>>>> storage (at your risk).  While Lustre has MMP, it is really more to
>>>>>> protect
>>>>>> against a mount typo than to guarantee resource fencing.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>> David
>>>>>>>
>>>>>>> --
>>>>>>> Personally, I liked the university. They gave us money and facilities,
>>>>>>> we didn't have to produce anything! You've never been out of college!
>>>>>>> You don't know what it's like out there! I've worked in the private
>>>>>>> sector. They expect results. -Ray Ghostbusters
>>>>>>> _______________________________________________
>>>>>>> Lustre-discuss mailing list
>>>>>>> Lustre-discuss at lists.lustre.org
>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 139 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100810/141501d1/attachment.vcf>


More information about the lustre-discuss mailing list