[Lustre-discuss] Lnet configuration: 1 ost per gige interface.

Steden Klaus Klaus.Steden at thomson.net
Wed Oct 22 12:22:08 PDT 2008

Hi Joe,

If you want to use a true link aggregation protocol such as LACP or Cisco's Etherchannel, you'll need an L3 switch that supports the protocol as well (and a Cisco switch at that, in the case of Etherchannel). Both partners in a link aggregate must be aware of the aggregate, and ports that are not part of the aggregate cannot be connected to it, and vice-versa (although typically a switch will simply stop forwarding if they detect agg links on non-agg ports, or non-agg links on agg ports).

In the case of ALB, the uplink switch does not need to be made aware that there's an aggregate, since the kernel that manages the aggregate will transparently remap everything. The switch will simply notice that a given IP is now associated with a new MAC address and update its ARP cache. This can noisy in switch logs, but on a dumb switch, nobody's the wiser. My gut tells me this works, even with a dumb switch.

The only way to know for sure, though, is simply to test it out yourself. I would try it out with a pair of dumb switches connected together rather than putting it directly on your network. If STP is active, plugging in this configuration may shut down your whole network if STP thinks it found a loop, so test it out in a sandbox before you go live.


-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org on behalf of Joe Georger
Sent: Wed 10/22/2008 5:00 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Lnet configuration: 1 ost per gige interface.
Even for bonding mode 6?


      balance-alb or 6 
          Adaptive load balancing: includes balance-tlb plus receive
          load balancing (rlb) for IPV4 traffic, and does not require
          any special switch support. The receive load balancing is
          achieved by ARP negotiation. 

    The bonding driver intercepts the ARP Replies sent by the local
    system on their way out and overwrites the source hardware address
    with the unique hardware address of one of the slaves in the bond
    such that different peers use different hardware addresses for the
    Receive traffic from connections created by the server is also
    balanced. When the local system sends an ARP Request the bonding
    driver copies and saves the peer's IP information from the ARP packet. 
    When the ARP Reply arrives from the peer, its hardware address is
    retrieved and the bonding driver initiates an ARP reply to this peer
    assigning it to one of the slaves in the bond. 
    A problematic outcome of using ARP negotiation for balancing is that
    each time that an ARP request is broadcast it uses the hardware
    address of the bond. Hence, peers learn the hardware address of the
    bond and the balancing of receive traffic collapses to the current
    slave. This is handled by sending updates (ARP Replies) to all the
    peers with their individually assigned hardware address such that
    the traffic is redistributed. Receive traffic is also redistributed
    when a new slave is added to the bond and when an inactive slave is
    re-activated. The receive load is distributed sequentially (round
    robin) among the group of highest speed slaves in the bond. 
    When a link is reconnected or a new slave joins the bond the receive
    traffic is redistributed among all active slaves in the bond by
    initiating ARP Replies with the selected mac address to each of the
    clients. The updelay parameter (detailed below) must be set to a
    value equal or greater than the switch's forwarding delay so that
    the ARP Replies sent to the peers will not be blocked by the switch. 

        * Prerequisites:
             1. Ethtool support in the base drivers for retrieving the
                speed of each slave.
             2. Base driver support for setting the hardware address of
                a device while it is open. This is required so that
                there will always be one slave in the team using the
                bond hardware address (the curr_active_slave) while
                having a unique hardware address for each slave in the
                bond. If the curr_active_slave fails its hardware
                address is swapped with the new curr_active_slave that
                was chosen.

Brian J. Murrell wrote:
> On Tue, 2008-10-21 at 12:15 -0500, Hendelman, Rob wrote:
>> I was under the impression that bonding nics required a manged switch to
>> support this.
> It does require a switch that supports link aggregation yes.  Sorry, I
> overlooked that you only had a dumb switch.
> b.
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org

More information about the lustre-discuss mailing list