[Lustre-discuss] FW: faking IB multi-rail with multihomed clients

Eric Barton eeb at sun.com
Fri Dec 21 09:14:35 PST 2007


Guys,

For those of you not party to the original email exchange, this is
about how we can aggregate bandwidth across both rails of a dual-rail
IB cluster using current lustre/LNET (i.e. before we have implemented
transparant LNET support for failover and bandwidth aggregation across
multiple networks).

The following 2 points are fundamental - everything below is a direct
consequence...

1. LNET is perfectly happy with multiple rails, but it doesn't load
   balance over them - the rail actually used for any communication
   is determined by the peer NID.

2. Lustre always uses the same NID to talk to a given server from a
   given node.  It choses the NID (a) with the fewest hops (to
   minimize routing) and (b) appearing first in the "networks" or
   "ip2nets" LNET configuration strings.

Now consider a 2-rail IB cluster running the OFA stack (i.e. OFED)
with the following IPoIB address assignments...

             ib0                 ib1
Servers      192.168.0.*         192.168.1.*
Clients      192.168.[2-127].*   192.168.[128-253].*

...here are some different configurations you could create...

A. I've got many more clients than servers in my cluster.  I don't
   care if an individual client can't get 2 rails of bandwidth because
   the servers are the actual bottleneck...

   ip2nets="o2ib0(ib0),o2ib1(ib1) 192.168.[0-1].*          #all servers;\
            o2ib0(ib0)            192.168.[2-253].[0-252/2]#even clients;\
            o2ib1(ib1)            192.168.[2-253].[1-253/2]#odd clients"

   This configuration gives every server 2 NIDs, one on each network -
   and statically load balances clients between the rails.

B. A single client must get 2 rails worth of bandwidth and I don't
   care if the max aggregate bandwidth is only (# servers) * (1 rail)...

   ip2nets="o2ib0(ib0)            192.168.[0-1].[0-252/2]#even servers;\
            o2ib1(ib1)            192.168.[0-1].[1-253/2]#odd servers;\
            o2ib0(ib0),o2ib1(ib1) 192.168.[2-253].*      #clients"

   This configuration gives every server a single NID on one rail or
   the other.  Clients have a NID on both rails.

C. I don't care how many hoops I have to jump through, but I really
   want all my clients and all my servers to use both rails...

   ip2nets="o2ib0(ib0),o2ib2(ib1) 192.168.[0-1].[0-252/2]  #even servers;\
            o2ib1(ib0),o2ib3(ib1) 192.168.[0-1].[1-253/2]  #odd servers;\
            o2ib0(ib0),o2ib3(ib1) 192.168.[2-253].[0-252/2]#even clients;\
            o2ib1(ib0),o2ib2(ib1) 192.168.[2-253].[1-253/2]#odd clients"

   This configuration includes 2 additional "fake" o2ib networks to
   work around lustre's simplistic NID selection algorithm. It
   connects "even" clients to "even" servers with o2ib0 on rail0 and
   to "odd" servers with o2ib3 on rail1.  Similarly it connects "odd"
   clients to "odd" servers with o2ib1 on rail0 and to "even" servers
   with o2ib2 on rail1.

Hope this demystifies things :)  

    Cheers,
              Eric





More information about the lustre-discuss mailing list