[Lustre-discuss] tcp0 for maximum effect

Klaus Steden klaus.steden at technicolor.com
Mon Jan 12 14:15:51 PST 2009

Hi folks,

Lustre doesn't support any inherent link aggregation, it simply utilizes the
device node the OS presents. If this is a bonded NIC, it will use it no
problem, but the underlying device driver takes care of load balancing and

I've used Lustre 1.6.x quite successfully with load-balanced 802.3ad
configurations; in some of my tests I was able to get about 350 MB/s
aggregate sustained across two OSS nodes with 2 x GigE bonded each.

802.3ad link aggregation is a standard NIC bonding protocol, and is
supported on all good quality L3 switches and by vendors like Cisco,
Foundry, Extreme, and Juniper.


On 1/11/09 11:37 AM, "Peter Grandi" <pg_lus at lus.for.sabi.co.UK> etched on
stone tablets:

>> I have two boxes that have this:
>> [root at lustrethree Desktop]# ifconfig
>> eth0      Link encap:Ethernet  HWaddr 00:1B:21:2A:17:76
>>           inet addr:  Bcast:  Mask:
>>           RX bytes:120168321 (114.6 MiB)  TX bytes:5300070662 (4.9 GiB)
>> [ ... ]
>> eth1      Link encap:Ethernet  HWaddr 00:1B:21:2A:1C:DC
>>           inet addr:  Bcast:  Mask:
>>           RX bytes:55673426 (53.0 MiB)  TX bytes:846 (846.0 b)
>> [ ... another 4 like that, ... ]
> That's a very bizarre network configuration, you have 5
> interfaces on the same subnet (presumably all plugged into the
> same switch) with no load balancing, as all the outgoing traffic
> goes via 'eth0'.
> You have some better alternatives:
> * Use bonding (if the switch supports to ties together the 5
>   interfaces as one virtual interface with a single IP address.
> * Use something like 'nexthop' routing (and a couple other
>   tricks) to split the load across the several interfaces. This
>   is easier for the outgoing traffic than the incoming traffic,
>   but it seems you have a lot more outgoing traffic.
> * Use 1 10Gb/s card per server and a 1Gb/s switch with 2 10Gb/s
>   ports. 10Gb/s cards and switches have fallen in price a lot
>   recently (check Myri.com), and a server that can do several
>   hundred MB/s really deserves a nice 10Gb/s interface.
> IIRC 'lnet' has something like bonding built in, but I am not
> sure that it handles multiple addresses in the same subnet well.
>> Would it be better to have these two boxes as OSS's or as MDT
>> or MGS machines?  Currently they are configured 1 as a MGS and
>> the other as the MDT.
> If these are the two servers with gigantic disk arrays, I'd have
> on each both MDS and OSS. Possibly with the OSTs replicated
> across both machines in an active/passive configuration.
>> The question is does LNET use the available tcp0 connections
>> different from the OSS perspective as opposed to the MDT or
>> MGS perspective?
> Not sure that the question means.
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

More information about the lustre-discuss mailing list