[Lustre-discuss] how to baseline the performance of a Lustre cluster?
Theodore Omtzigt
theo at stillwater-sc.com
Fri Jul 15 13:30:47 PDT 2011
I got a basic Lustre cluster up and running and did two experiments:
1- using GbE as the interconnect
2- using QDR IB as interconnect
Here are the simple performance results I collected using the pointers
from the Lustre user guide:
[root at 22-82 ~]# ost-survey /mnt/lustre01
/usr/bin/ost-survey: 06/29/11 OST speed survey on /mnt/lustre01 from
172.21.22.82 at tcp
Number of Active OST devices : 8
Worst Read OST indx: 1 speed: 57.704295
Best Read OST indx: 3 speed: 61.655312
Read Average: 59.785920 +/- 1.245626 MB/s
Worst Write OST indx: 3 speed: 28.564328
Best Write OST indx: 5 speed: 70.976016
Write Average: 55.497721 +/- 13.457404 MB/s
Ost# Read(MB/s) Write(MB/s) Read-time Write-time
----------------------------------------------------
0 59.931 65.722 0.501 0.456
1 57.704 42.436 0.520 0.707
2 60.056 66.074 0.500 0.454
3 61.655 28.564 0.487 1.050
4 58.096 62.979 0.516 0.476
5 59.935 70.976 0.501 0.423
6 61.053 57.724 0.491 0.520
7 59.856 49.507 0.501 0.606
[root at 22-82_ib ~]# ost-survey /mnt/lustre01
/usr/bin/ost-survey: 07/14/11 OST speed survey on /mnt/lustre01 from
10.1.3.82 at o2ib
Number of Active OST devices : 8
Worst Read OST indx: 0 speed: 180.625987
Best Read OST indx: 6 speed: 214.961331
Read Average: 200.478485 +/- 11.408814 MB/s
Worst Write OST indx: 0 speed: 291.709350
Best Write OST indx: 6 speed: 496.616135
Write Average: 397.025375 +/- 59.815286 MB/s
Ost# Read(MB/s) Write(MB/s) Read-time Write-time
----------------------------------------------------
0 180.626 291.709 0.166 0.103
1 206.211 396.815 0.145 0.076
2 207.928 356.645 0.144 0.084
3 197.543 384.335 0.152 0.078
4 206.908 403.361 0.145 0.074
5 205.670 470.235 0.146 0.064
6 214.961 496.616 0.140 0.060
7 183.981 376.487 0.163 0.080
Are these results any good?
To me it looks very disappointing as we can get 3GB/s from the RAID
controller aggregating a collection of raw SAS drives on the OSTs, and
we should be able to get a peak of -5GB/s from QDR IB.
First question: is this baseline reasonable?
Second question: what are the tools I can use to better understand the
Lustre FS behavior to characterize the performance I am getting on the
client side?
I did check the IB network and I did not record any IB network errors
during these runs. So I am confident that the IB network was working
properly.
Looking forward to better understanding Lustre performance.
More information about the lustre-discuss
mailing list