[Lustre-discuss] Lnet configuration: 1 ost per gige interface.
Klaus.Steden at thomson.net
Wed Oct 22 12:22:08 PDT 2008
If you want to use a true link aggregation protocol such as LACP or Cisco's Etherchannel, you'll need an L3 switch that supports the protocol as well (and a Cisco switch at that, in the case of Etherchannel). Both partners in a link aggregate must be aware of the aggregate, and ports that are not part of the aggregate cannot be connected to it, and vice-versa (although typically a switch will simply stop forwarding if they detect agg links on non-agg ports, or non-agg links on agg ports).
In the case of ALB, the uplink switch does not need to be made aware that there's an aggregate, since the kernel that manages the aggregate will transparently remap everything. The switch will simply notice that a given IP is now associated with a new MAC address and update its ARP cache. This can noisy in switch logs, but on a dumb switch, nobody's the wiser. My gut tells me this works, even with a dumb switch.
The only way to know for sure, though, is simply to test it out yourself. I would try it out with a pair of dumb switches connected together rather than putting it directly on your network. If STP is active, plugging in this configuration may shut down your whole network if STP thinks it found a loop, so test it out in a sandbox before you go live.
From: lustre-discuss-bounces at lists.lustre.org on behalf of Joe Georger
Sent: Wed 10/22/2008 5:00 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Lnet configuration: 1 ost per gige interface.
Even for bonding mode 6?
balance-alb or 6
Adaptive load balancing: includes balance-tlb plus receive
load balancing (rlb) for IPV4 traffic, and does not require
any special switch support. The receive load balancing is
achieved by ARP negotiation.
The bonding driver intercepts the ARP Replies sent by the local
system on their way out and overwrites the source hardware address
with the unique hardware address of one of the slaves in the bond
such that different peers use different hardware addresses for the
Receive traffic from connections created by the server is also
balanced. When the local system sends an ARP Request the bonding
driver copies and saves the peer's IP information from the ARP packet.
When the ARP Reply arrives from the peer, its hardware address is
retrieved and the bonding driver initiates an ARP reply to this peer
assigning it to one of the slaves in the bond.
A problematic outcome of using ARP negotiation for balancing is that
each time that an ARP request is broadcast it uses the hardware
address of the bond. Hence, peers learn the hardware address of the
bond and the balancing of receive traffic collapses to the current
slave. This is handled by sending updates (ARP Replies) to all the
peers with their individually assigned hardware address such that
the traffic is redistributed. Receive traffic is also redistributed
when a new slave is added to the bond and when an inactive slave is
re-activated. The receive load is distributed sequentially (round
robin) among the group of highest speed slaves in the bond.
When a link is reconnected or a new slave joins the bond the receive
traffic is redistributed among all active slaves in the bond by
initiating ARP Replies with the selected mac address to each of the
clients. The updelay parameter (detailed below) must be set to a
value equal or greater than the switch's forwarding delay so that
the ARP Replies sent to the peers will not be blocked by the switch.
1. Ethtool support in the base drivers for retrieving the
speed of each slave.
2. Base driver support for setting the hardware address of
a device while it is open. This is required so that
there will always be one slave in the team using the
bond hardware address (the curr_active_slave) while
having a unique hardware address for each slave in the
bond. If the curr_active_slave fails its hardware
address is swapped with the new curr_active_slave that
Brian J. Murrell wrote:
> On Tue, 2008-10-21 at 12:15 -0500, Hendelman, Rob wrote:
>> I was under the impression that bonding nics required a manged switch to
>> support this.
> It does require a switch that supports link aggregation yes. Sorry, I
> overlooked that you only had a dumb switch.
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
More information about the lustre-discuss