[Lustre-discuss] Network aliasing and HA

Kevin Van Maren Kevin.Vanmaren at Sun.COM
Tue Sep 23 08:09:57 PDT 2008


Note that you do not normally use IP takeover with Lustre/Heartbeat: you 
set the failover IP addresses with the mkfs.lustre command, and Lustre 
reconnects to the _other_ address when it is disconnected.

In your case, you would have 2 fixed addresses for each node (w/o 
heartbeat - do NOT use the heartbeat virtual IP addresses), and specify 
both those failover NIDs (rather than just 1).

Lustre1.6 is a bit different from a lot of HA/Heartbeat users: Lustre 
_knows_ about the multiple paths/addresses, and simply requires 
Heartbeat to ensure it is mounted on exactly one node in the failover 
pair: it does NOT rely on IP takeover for HA.

Kevin Van Maren


Timh Bergström wrote:
> 2008/9/23 Brian J. Murrell <Brian.Murrell at sun.com>:
>   
>> On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote:
>>     
>>> Hi,
>>>       
>> Hi,
>>     
> Hi again, and thanks for the quick reply!
>
>   
>>> My (current) modprobe:
>>>
>>> options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50
>>>       
>> This syntax is incorrect.  For some examples of multi-homed
>> configurations see the manual at
>> http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50642998_20213
>>     
>
> Yes that's the link i've been consulting, perhaps im not looking hard enough.
>
>   
>>> This is the errors i get:
>>> LustreError: 10f-e: Error parsing
>>> 'networks="tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50"'
>>>       
>> When you specify "networks" because you specify the interfaces to use,
>> you don't need to specify the ip address.  I think you are confusing the
>> networks and ipnets options.
>>     
>
> The problem here exactly is that the physical interfaces is there, but
> not with the ip-addresses i want the mdt to "listen" on - the "NIDs",
> they are added later through heartbeat as aliases (IPaddr2::10.4.21.50
> IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd).
>
>   
>>> LustreError: 110-0: here...............................|---------|
>>> LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network
>>> initialisation failed
>>> (along with a bunch of errors since this module does not load)
>>>       
>>> I've tried with tcp0(eth0:0) which fails with about the same error,
>>> i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine
>>> ones) but works.
>>>       
>> What is the topology exactly?  Are there two nics or one nic with two
>> addresses?  Are the two nics on the same physical network or separate
>> physical networks?
>>     
>
> eth0 and eth1 are physical interfaces, they have statically assigned
> ip's (for management, supervision etc), heartbeat then adds addresses
> to theese two interfaces if the node is "primary".
>
> If it matters - eth0 and eth1 has separated physical paths to
> everything, this is because we want to survive a physical fail on the
> network before failing over to another physical server.
>
> As I read the manual, i format my OST's with more than one --mgsnode
> option, which in turn will make the OST "know" about both path's to
> the MDS/MGS server(s). As in, if first MGS does not work (physical
> network failure on side A) - try second (Physical side B).
>
> What we healthcheck on is the data/disks/server hardware which will
> tell heartbeat to fail over to server 2 which takes over network path
> A and network path B (on 10.4.[21,22].50), and the OST's/clients
> should continue working without noticing.
>
>   
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>>     
>
>
>
>   




More information about the lustre-discuss mailing list