[Lustre-discuss] 1GB throughput limit on OST (1.8.5)?

Thu Jan 27 05:48:43 PST 2011

I guess you have two gigabit nics bonded in mode 6 and not two 1GB nics? 
(B-Bytes, b-bits) The max aggregate throughput could be about 200MBps 
out of the 2 bonded nics. I think the mode 0 bonding works only with 
cisco etherchannel or something similar on the switch side. Same with 
the FC connection, its 4Gbps (not 4GBps) or about 400-500 MBps max 
throughout. Maybe you could also see the max read and write capabilities 
of the raid controller other than just the network. When testing with 
dd, some of the data remains as dirty data till its flushed into the 
disk. I think the default background ratio is 10% for rhel5 which would 
be sizable if your oss have lots of ram. There is chance of lockup of 
the oss once it hits the dirty_ratio limit,which is 40% by default. So a 
bit more aggressive flush to disk by lowering the background_ratio and a 
bit more headroom before it hits the dirty_ratio is generally desirable 
if your raid controller could keep up with it. So with your current 
setup, i guess you could get a max of 400MBps out of both OSS's if they 
both have two 1Gb nics in them. Maybe if you have one of the switches 
from Dell that has 4 10Gb ports in them (their powerconnect 6248), 10Gb 
nics for your OSS's might be a cheaper way to increase the aggregate 
performance. I think over 1GBps from a client is possible in cases where 
you use infiniband and rdma to deliver data.

David Merhar wrote:
> Our OSS's with 2x1GB NICs (bonded) appear limited to 1GB worth of  
> write throughput each.
> 
> Our setup:
> 2 OSS serving 1 OST each
> Lustre 1.8.5
> RHEL 5.4
> New Dell M610's blade servers with plenty of CPU and RAM
> All SAN fibre connections are at least 4GB
> 
> Some notes:
> - A direct write (dd) from a single OSS to the OST gets 4GB, the OSS's  
> fibre wire speed.
> - A single client will get 2GB of lustre write speed, the client's  
> ethernet wire speed.
> - We've tried bond mode 6 and 0 on all systems.  With mode 6 we will  
> see both NICs on both OSSs receiving data.
> - We've tried multiple OSTs per OSS.
> 
> But 2 clients writing a file will get 2GB of total bandwidth to the  
> filesystems.  We have been unable to isolate any particular resource  
> bottleneck.  None of the systems (MDS, OSS, or client) seem to be  
> working very hard.
> 
> The 1GB per OSS threshold is so consistent, that it almost appears by  
> design - and hopefully we're missing something obvious.
> 
> Any advice?
> 
> Thanks.
> 
> djm
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss