[Lustre-discuss] tcp network load balancing understanding lustre 1.8

Klaus Steden klaus.steden at technicolor.com
Thu May 7 15:02:49 PDT 2009


Hi Michael,

Just want to throw my two cents in with Isaac's posting, as I spent a great
deal of time working with these kinds of features over the course of the
last two years.

In my experience with Lustre 1.6, in the case where multiple NICs were
available, Lustre will default to using the first one exclusively until it
detects a failure and then switches over to the next available. It will also
not distinguish between different NIC types, i.e. IB, GigE, etc., will be
picked based on discovery order not speed or some other metric.

I didn't even touch Lustre bonding, because as you both remark, it's a
little convoluted. I spent a lot of time experimenting with Lustre over
802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS
nodes produced very respectable to very good numbers. Across a pair of OSS
nodes each with 2 x GigE NICs, I was able to sustain ~ 350 MB/s write speed
when running sandbox tests, so it appears that although the LACP driver
doesn't balance a connection across multiple links (i.e. a 2 GigE LACP bond
doesn't give you 2 Gbit throughput for a single network I/O), the Lustre
implementation somehow manages to squeeze more data through the pipe.

To get it set up, simply configure NIC bonding of whatever flavour suits
your needs on the OS nodes, and then assign 'bond0' to your tcp networks,
something like this:

options lnet networks=tcp0(bond0)

and you should be off to the races.

hth,
Klaus

On 5/7/09 12:57 PM, "Isaac Huang" <He.Huang at Sun.COM> etched on stone
tablets:

> On Thu, May 07, 2009 at 02:50:13PM +0200, Michael Ruepp wrote:
>> Hi there,
>> ......
>> I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0
>> and 10.111.20.39-42 oss1
>> 
>> Do I have to make modprobe.conf.local look like this to force lustre
>> to use all four interfaces parallel:
>> 
>> options lnet networks=tcp0(eth0,eth1,eth2,eth3)
>> Because on Page 138 the 1.8 Manual says:
>> "Note ? In the case of TCP-only clients, the first available non-
>> loopback IP interface
>> is used for tcp0 since the interfaces are not specified. "
> 
> Correct.
> 
>> or do I have to specify it like this:
>> options lnet networks=tcp
>> Because on Page 112 the lustre 1.6 Manual says:
>> "Note ? In the case of TCP-only clients, all available IP interfaces
>> are used for tcp0
> 
> Wrong. It needs to be updated as well, Sheila?
> 
>> ......
>> My goal ist to let lustre utilize all four Gb Links parallel. And my
>> Lustre Clients are equipped with two Gb links which should be utilized
>> by the lustre clients as well (eth0, eth1)
>> 
>> Or is bonding the better solution in terms of performance?
> 
> I don't have any performance comparisons between the two approaches,
> but I'd suggest to go with Linux bonding instead (let's call the
> tcp0(eth0,...ethN) approach Lustre bonding), because:
> 1. With Lustre bonding it's rather tricky to get routing right,
> especially when all NICs reside in a same IP subnet. Lustre tcp
> network driver, as its name suggests, works at TCP layer and the
> decision as to which outgoing interface to use depends on Linux IP
> layer routing. When all NICs live in a same IP subnet, it's very
> possible that all outgoing packets would go through the interface of
> the 1st route in the Linux routing table, unless some tweaking has
> been done to also take source IPs into account. Incoming packets could
> also come in via unexpected NICs, depending on your settings in
> /proc/sys/net/ipv4/conf/*/arp_ignore and your ethernet topology.
> 
> 2. Linux bonding does a good job of detecting link status via either
> the ARP monitor or the MII monitor, but no such mechanism exists in
> Lustre bonding.
> 
> In fact, the Lustre bonding is an officially obsoleted feature if I
> remember correctly.
> 
> Thanks,
> Isaac
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list