[Lustre-discuss] Network aliasing and HA

Timh Bergström timh.bergstrom at diino.net
Thu Sep 25 13:54:59 PDT 2008


To follow up on this matter, i've currently set ha/drbd as suggested,
formatted the ost's with double mgsserver directives and also mounted
with double addresses on the clients, as ip1 at tcp0:ip2 at tcp1:/fsname -
though, if i fail mgs/mdt 1 it does not recover (in a resonable time),
what kinds of tuning/settings will affect this?

//Timh

2008/9/23 Timh Bergström <timh.bergstrom at diino.net>:
> Thank you, that's the path i've taken from the last message on this
> list, as I misunderstood some of the drbd/ha setups before. However,
> using 4 mgsnode "paths", is that recommended or should I use one
> mgspath per node and use the other as some sort of manual failover?
>
> Regards,
> Timh
>
> 2008/9/23 Kevin Van Maren <Kevin.Vanmaren at sun.com>:
>> Note that you do not normally use IP takeover with Lustre/Heartbeat: you set
>> the failover IP addresses with the mkfs.lustre command, and Lustre
>> reconnects to the _other_ address when it is disconnected.
>>
>> In your case, you would have 2 fixed addresses for each node (w/o heartbeat
>> - do NOT use the heartbeat virtual IP addresses), and specify both those
>> failover NIDs (rather than just 1).
>>
>> Lustre1.6 is a bit different from a lot of HA/Heartbeat users: Lustre
>> _knows_ about the multiple paths/addresses, and simply requires Heartbeat to
>> ensure it is mounted on exactly one node in the failover pair: it does NOT
>> rely on IP takeover for HA.
>>
>> Kevin Van Maren
>>
>>
>> Timh Bergström wrote:
>>>
>>> 2008/9/23 Brian J. Murrell <Brian.Murrell at sun.com>:
>>>
>>>>
>>>> On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>
>>>> Hi,
>>>>
>>>
>>> Hi again, and thanks for the quick reply!
>>>
>>>
>>>>>
>>>>> My (current) modprobe:
>>>>>
>>>>> options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50
>>>>>
>>>>
>>>> This syntax is incorrect.  For some examples of multi-homed
>>>> configurations see the manual at
>>>>
>>>> http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50642998_20213
>>>>
>>>
>>> Yes that's the link i've been consulting, perhaps im not looking hard
>>> enough.
>>>
>>>
>>>>>
>>>>> This is the errors i get:
>>>>> LustreError: 10f-e: Error parsing
>>>>> 'networks="tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50"'
>>>>>
>>>>
>>>> When you specify "networks" because you specify the interfaces to use,
>>>> you don't need to specify the ip address.  I think you are confusing the
>>>> networks and ipnets options.
>>>>
>>>
>>> The problem here exactly is that the physical interfaces is there, but
>>> not with the ip-addresses i want the mdt to "listen" on - the "NIDs",
>>> they are added later through heartbeat as aliases (IPaddr2::10.4.21.50
>>> IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd).
>>>
>>>
>>>>>
>>>>> LustreError: 110-0: here...............................|---------|
>>>>> LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network
>>>>> initialisation failed
>>>>> (along with a bunch of errors since this module does not load)
>>>>>      I've tried with tcp0(eth0:0) which fails with about the same error,
>>>>> i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine
>>>>> ones) but works.
>>>>>
>>>>
>>>> What is the topology exactly?  Are there two nics or one nic with two
>>>> addresses?  Are the two nics on the same physical network or separate
>>>> physical networks?
>>>>
>>>
>>> eth0 and eth1 are physical interfaces, they have statically assigned
>>> ip's (for management, supervision etc), heartbeat then adds addresses
>>> to theese two interfaces if the node is "primary".
>>>
>>> If it matters - eth0 and eth1 has separated physical paths to
>>> everything, this is because we want to survive a physical fail on the
>>> network before failing over to another physical server.
>>>
>>> As I read the manual, i format my OST's with more than one --mgsnode
>>> option, which in turn will make the OST "know" about both path's to
>>> the MDS/MGS server(s). As in, if first MGS does not work (physical
>>> network failure on side A) - try second (Physical side B).
>>>
>>> What we healthcheck on is the data/disks/server hardware which will
>>> tell heartbeat to fail over to server 2 which takes over network path
>>> A and network path B (on 10.4.[21,22].50), and the OST's/clients
>>> should continue working without noticing.
>>>
>>>
>>>>
>>>> b.
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
> --
> Timh Bergström
> System Administrator
> Diino AB - www.diino.com
> :wq
>



-- 
Timh Bergström
System Administrator
Diino AB - www.diino.com
:wq



More information about the lustre-discuss mailing list