[Lustre-discuss] Network aliasing and HA

Timh Bergström timh.bergstrom at diino.net
Tue Sep 23 07:44:57 PDT 2008


2008/9/23 Brian J. Murrell <Brian.Murrell at sun.com>:
> On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote:
>> Hi,
>
> Hi,
Hi again, and thanks for the quick reply!

>
>> My (current) modprobe:
>>
>> options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50
>
> This syntax is incorrect.  For some examples of multi-homed
> configurations see the manual at
> http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50642998_20213

Yes that's the link i've been consulting, perhaps im not looking hard enough.

>
>> This is the errors i get:
>> LustreError: 10f-e: Error parsing
>> 'networks="tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50"'
>
> When you specify "networks" because you specify the interfaces to use,
> you don't need to specify the ip address.  I think you are confusing the
> networks and ipnets options.

The problem here exactly is that the physical interfaces is there, but
not with the ip-addresses i want the mdt to "listen" on - the "NIDs",
they are added later through heartbeat as aliases (IPaddr2::10.4.21.50
IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd).

>
>> LustreError: 110-0: here...............................|---------|
>> LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network
>> initialisation failed
>> (along with a bunch of errors since this module does not load)
>
>> I've tried with tcp0(eth0:0) which fails with about the same error,
>> i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine
>> ones) but works.
>
> What is the topology exactly?  Are there two nics or one nic with two
> addresses?  Are the two nics on the same physical network or separate
> physical networks?

eth0 and eth1 are physical interfaces, they have statically assigned
ip's (for management, supervision etc), heartbeat then adds addresses
to theese two interfaces if the node is "primary".

If it matters - eth0 and eth1 has separated physical paths to
everything, this is because we want to survive a physical fail on the
network before failing over to another physical server.

As I read the manual, i format my OST's with more than one --mgsnode
option, which in turn will make the OST "know" about both path's to
the MDS/MGS server(s). As in, if first MGS does not work (physical
network failure on side A) - try second (Physical side B).

What we healthcheck on is the data/disks/server hardware which will
tell heartbeat to fail over to server 2 which takes over network path
A and network path B (on 10.4.[21,22].50), and the OST's/clients
should continue working without noticing.

>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



-- 
Timh Bergström
System Administrator
Diino AB - www.diino.com
:wq



More information about the lustre-discuss mailing list