[Lustre-discuss] how to baseline the performance of a Lustre cluster?

Mon Jul 18 06:28:12 PDT 2011

On Fri, 15 Jul 2011, Theodore Omtzigt wrote:

> To me it looks very disappointing as we can get 3GB/s from the RAID
> controller aggregating a collection of raw SAS drives on the OSTs, and
> we should be able to get a peak of -5GB/s from QDR IB.
>
> First question: is this baseline reasonable?

For starters, the theoretical peak of QDR IB is 4GB/s in terms of moving 
real data. 40Gb/s is the signaling rate and you need to factor in the PCI 
bus 8/10 encoding. So your 40Gb/s becomes 32Gb/s right off the bat. Now 
try and move some data with something like mpi_send and you will see that 
the real amount of data you can send is really more like 24Gb/s or 3GB/s.

The test size for ost_survey is pretty small. 30MB. You can increase that 
with the "-s" flag. Try at least 100MB.

You should also turn of checksums to test raw performance. There is an 
lctl conf_param to do this, but the quick and dirty route on the client is 
the following bash:

for OST in /proc/fs/lustre/osc/*/checksums
do
echo 0 > $OST
done

For comparison sake, on my latest QDR connected Lustre file system with 
LSI 9285-8e controllers connected to JBODs of slowing disks in 11 disk 
RAID 6 stripes, I get around 500MB/s write and 350MB/s read using 
ost-survey with 100MB data chunks.

Your numbers seem reasonable.

Tim

-- 
-------------------------------------------
Tim Carlson, PhD
Senior Research Scientist
Environmental Molecular Sciences Laboratory