[Lustre-discuss] tcp network load balancing understanding lustre 1.8

Arden Wiebe albert682 at yahoo.com
Sat May 9 21:15:04 PDT 2009


Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files.  The manual is fairly clear on that.  

In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks.  Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree.  

In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target

To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason.

I'll be taking a run at upgrading everything to 1.8 in the coming week or so and when I do I'll grab some new screenshots and post the relevant items to the wiki.  Otherwise if someone else wants to post the existing screenshots your welcome to use them as they do detail a ground up build. Apparently 1.8 is great with small files now so it should work even better with www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo
 

--- On Sat, 5/9/09, Andreas Dilger <adilger at sun.com> wrote:

> From: Andreas Dilger <adilger at sun.com>
> Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
> To: "Arden Wiebe" <albert682 at yahoo.com>
> Cc: lustre-discuss at lists.lustre.org, "Michael Ruepp" <michael at schwarzfilm.ch>
> Date: Saturday, May 9, 2009, 11:31 AM
> On May 09, 2009  09:18 -0700,
> Arden Wiebe wrote:
> > This might help answer some questions.
> > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
> my mostly not
> > tuned OSS and OST's pulling 400+MiB/s over TCP Bonding
> provided by the
> > kernel complete with a cat of the modeprobe.conf
> file.  You have the other
> > links I've sent you but the picture above is relevant
> to your questions.
> 
> Arden, thanks for sharing this info.  Any chance you
> could post it to 
> wiki.lustre.org?  It would seem there is one bit of
> info missing somewhere -
> how does bond0 know which interfaces to use? 
> 
> 
> Also, another oddity - the network monitor is showing
> 450MiB/s Received,
> yet the disk is showing only about 170MiB/s going to the
> disk.  Either
> something is wacky with the monitoring (e.g. it is counting
> Received for
> both the eth* networks AND bond0), or Lustre is doing
> something very
> wierd and retransmitting the bulk data like crazy (seems
> unlikely).
> 
> 
> > --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch>
> wrote:
> > 
> > > From: Michael Ruepp <michael at schwarzfilm.ch>
> > > Subject: [Lustre-discuss] tcp network load
> balancing understanding lustre 1.8
> > > To: lustre-discuss at lists.lustre.org
> > > Date: Thursday, May 7, 2009, 5:50 AM
> > > Hi there,
> > > 
> > > I am configured a simple tcp lustre 1.8 with one
> mdc (one
> > > nic) and two  
> > > oss (four nic per oss)
> > > As well as in the 1.6 documentation, the
> multihomed
> > > sections is a  
> > > little bit unclear to me.
> > > 
> > > I give every NID a IP in the same subnet, eg:
> > > 10.111.20.35-38 - oss0  
> > > and 10.111.20.39-42 oss1
> > > 
> > > Do I have to make modprobe.conf.local look like
> this to
> > > force lustre  
> > > to use all four interfaces parallel:
> > > 
> > > options lnet networks=tcp0(eth0,eth1,eth2,eth3)
> > > Because on Page 138 the 1.8 Manual says:
> > > "Note – In the case of TCP-only clients, the
> first
> > > available non- 
> > > loopback IP interface
> > > is used for tcp0 since the interfaces are not
> specified. "
> > > 
> > > or do I have to specify it like this:
> > > options lnet networks=tcp
> > > Because on Page 112 the lustre 1.6 Manual says:
> > > "Note – In the case of TCP-only clients, all
> available IP
> > > interfaces  
> > > are used for tcp0
> > > since the interfaces are not specified. If there
> is more
> > > than one, the  
> > > IP of the first one
> > > found is used to construct the tcp0 ID."
> > > 
> > > Which is the opposite of the 1.8 Manual
> > > 
> > > My goal ist to let lustre utilize all four Gb
> Links
> > > parallel. And my  
> > > Lustre Clients are equipped with two Gb links
> which should
> > > be utilized  
> > > by the lustre clients as well (eth0, eth1)
> > > 
> > > Or is bonding the better solution in terms of
> performance?
> > > 
> > > Thanks very much for input,
> > > 
> > > Michael Ruepp
> > > Schwarzfilm AG
> > > 
> > > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > > 
> > 
> > 
> >       
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> 


      



More information about the lustre-discuss mailing list