[Lustre-discuss] tcp network load balancing understanding lustre 1.8

Sun May 10 07:07:08 PDT 2009

Mag Gam wrote:
> Thanks for the screen shot Arden.
> 
> What is the maximum # of slaves you can have on a bonded interface?
> 
> 
> 
> On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe <albert682 at yahoo.com> wrote:
>> Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files.  The manual is fairly clear on that.
>>
>> In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks.  Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree.
>>
>> In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target
>>
>> To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html
>>
>> Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason.

I too have been trying to benchmark a lustre filesystem with iozone 3.321.

Sometimes it works, and sometimes it hangs.

I turned on debugging, and ran a test with 2 clients on each of 40 
machines. In the output, I get lines like:
  loop: R_STAT_DATA for client 9

For 79 clients, there are two of these messages in the output, and for 
one of them only 1.

I've had a brief skim of the source code, and I think that the problem 
is that iozone uses UDP packets to communicate. On a heavily loaded 
network, one of these is bound to get lost. Presumably iozone doesn't 
have the right retry strategy.

The iozone author has suggested using a different network for the timing 
packets - but I don't think I can justify the time or expense involved 
in building one purely to do some benchmarking.

Chris

PS On a machine with 2 bonded Gigabit ethernet cards, I found I needed 
two iozone threads to get the available bandwidth. One iozone thread 
seemed to get the bandwidth from one card only.