[Lustre-discuss] tcp network load balancing understanding lustre 1.8

Andreas Dilger adilger at sun.com
Sat May 9 11:31:27 PDT 2009


On May 09, 2009  09:18 -0700, Arden Wiebe wrote:
> This might help answer some questions.
> http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not
> tuned OSS and OST's pulling 400+MiB/s over TCP Bonding provided by the
> kernel complete with a cat of the modeprobe.conf file.  You have the other
> links I've sent you but the picture above is relevant to your questions.

Arden, thanks for sharing this info.  Any chance you could post it to 
wiki.lustre.org?  It would seem there is one bit of info missing somewhere -
how does bond0 know which interfaces to use? 


Also, another oddity - the network monitor is showing 450MiB/s Received,
yet the disk is showing only about 170MiB/s going to the disk.  Either
something is wacky with the monitoring (e.g. it is counting Received for
both the eth* networks AND bond0), or Lustre is doing something very
wierd and retransmitting the bulk data like crazy (seems unlikely).


> --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> wrote:
> 
> > From: Michael Ruepp <michael at schwarzfilm.ch>
> > Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
> > To: lustre-discuss at lists.lustre.org
> > Date: Thursday, May 7, 2009, 5:50 AM
> > Hi there,
> > 
> > I am configured a simple tcp lustre 1.8 with one mdc (one
> > nic) and two  
> > oss (four nic per oss)
> > As well as in the 1.6 documentation, the multihomed
> > sections is a  
> > little bit unclear to me.
> > 
> > I give every NID a IP in the same subnet, eg:
> > 10.111.20.35-38 - oss0  
> > and 10.111.20.39-42 oss1
> > 
> > Do I have to make modprobe.conf.local look like this to
> > force lustre  
> > to use all four interfaces parallel:
> > 
> > options lnet networks=tcp0(eth0,eth1,eth2,eth3)
> > Because on Page 138 the 1.8 Manual says:
> > "Note – In the case of TCP-only clients, the first
> > available non- 
> > loopback IP interface
> > is used for tcp0 since the interfaces are not specified. "
> > 
> > or do I have to specify it like this:
> > options lnet networks=tcp
> > Because on Page 112 the lustre 1.6 Manual says:
> > "Note – In the case of TCP-only clients, all available IP
> > interfaces  
> > are used for tcp0
> > since the interfaces are not specified. If there is more
> > than one, the  
> > IP of the first one
> > found is used to construct the tcp0 ID."
> > 
> > Which is the opposite of the 1.8 Manual
> > 
> > My goal ist to let lustre utilize all four Gb Links
> > parallel. And my  
> > Lustre Clients are equipped with two Gb links which should
> > be utilized  
> > by the lustre clients as well (eth0, eth1)
> > 
> > Or is bonding the better solution in terms of performance?
> > 
> > Thanks very much for input,
> > 
> > Michael Ruepp
> > Schwarzfilm AG
> > 
> > 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > 
> 
> 
>       
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list