[Lustre-discuss] Two questions about the tuning of Lustre file system.

Tanin fanqielee at gmail.com
Thu May 19 16:29:23 PDT 2011


Dear all,

I have two question regarding the performance of Lustre System. Currently,
we have 5 OSS nodes, and each OSS carries 8 OST's. All the nodes (including
the MDT/MGS node and client node) are connected to a Mellanox MTS 3600
InfiniBand switch using RDMA for data transfer. The bandwidth of the
network is 40Gbps. The kernel version is  'Linux
2.6.18-164.11.1.el5_lustre.1.8.3 #1 SMP Fri Apr 9 18:00:39 MDT 2010 x86_64
x86_64 x86_64 GNU/Linux'. OS is RHEL 5.5.  Lustre version is 1.8.3. OFED
Version is 1.5.2. IB HCA is Mellanox Technologies MT26428 ConnectX VPI PCIe
IB QDR.

And I did a simple test on the client side to see the peak data reading
performance. Here is the data:

#time          Data transferred  Bandwidth
2 sec           2.18 GBytes     8.71 Gbits/sec
2 sec           2.06 GBytes     8.24 Gbits/sec
2 sec           2.10 GBytes     8.40 Gbits/sec
2 sec           1.93 GBytes     7.73 Gbits/sec
2 sec           1.50 GBytes     6.02 Gbits/sec
2 sec           420.00 MBytes   1.64 Gbits/sec
2 sec           2.19 GBytes     8.75 Gbits/sec
2 sec           2.08 GBytes     8.32 Gbits/sec
2 sec           2.08 GBytes     8.32 Gbits/sec
2 sec           1.99 GBytes     7.97 Gbits/sec
2 sec           1.80 GBytes     7.19 Gbits/sec
*2 sec           160.00 MBytes   640.00 Mbits/sec*
2 sec           2.15 GBytes     8.59 Gbits/sec
2 sec           2.13 GBytes     8.52 Gbits/sec
2 sec           2.15 GBytes     8.59 Gbits/sec
2 sec           2.09 GBytes     8.36 Gbits/sec
2 sec           2.09 GBytes     8.36 Gbits/sec
2 sec           2.07 GBytes     8.28 Gbits/sec
2 sec           2.15 GBytes     8.59 Gbits/sec
2 sec           2.11 GBytes     8.44 Gbits/sec
2 sec           2.05 GBytes     8.20 Gbits/sec
*2 sec           0.00 Bytes      0.00 bits/sec*
*2 sec           0.00 Bytes      0.00 bits/sec*
2 sec           1.95 GBytes     7.81 Gbits/sec
2 sec           2.14 GBytes     8.55 Gbits/sec
2 sec           1.99 GBytes     7.97 Gbits/sec
2 sec           2.00 GBytes     8.01 Gbits/sec
2 sec           370.00 MBytes   1.45 Gbits/sec
2 sec           1.96 GBytes     7.85 Gbits/sec
2 sec           2.03 GBytes     8.12 Gbits/sec
2 sec           1.89 GBytes     7.58 Gbits/sec
2 sec           1.94 GBytes     7.77 Gbits/sec
2 sec           640.00 MBytes   2.50 Gbits/sec
2 sec           1.47 GBytes     5.90 Gbits/sec
2 sec           1.94 GBytes     7.77 Gbits/sec
2 sec           1.90 GBytes     7.62 Gbits/sec
2 sec           1.94 GBytes     7.77 Gbits/sec
2 sec           1.18 GBytes     4.73 Gbits/sec
2 sec           940.00 MBytes   3.67 Gbits/sec
2 sec           1.97 GBytes     7.89 Gbits/sec
2 sec           1.93 GBytes     7.73 Gbits/sec
2 sec           1.87 GBytes     7.46 Gbits/sec
2 sec           1.77 GBytes     7.07 Gbits/sec
2 sec           320.00 MBytes   1.25 Gbits/sec
2 sec           1.97 GBytes     7.89 Gbits/sec
2 sec           2.00 GBytes     8.01 Gbits/sec
2 sec           1.89 GBytes     7.58 Gbits/sec
2 sec           1.93 GBytes     7.73 Gbits/sec
2 sec           350.00 MBytes   1.37 Gbits/sec
2 sec           1.77 GBytes     7.07 Gbits/sec
2 sec           1.92 GBytes     7.70 Gbits/sec
2 sec           2.05 GBytes     8.20 Gbits/sec
2 sec           2.01 GBytes     8.05 Gbits/sec
2 sec           710.00 MBytes   2.77 Gbits/sec
2 sec           1.59 GBytes     6.37 Gbits/sec
2 sec           2.00 GBytes     8.01 Gbits/sec
2 sec           710.00 MBytes   2.77 Gbits/sec
2 sec           1.59 GBytes     6.37 Gbits/sec
2 sec           2.00 GBytes     8.01 Gbits/sec
2 sec           1.88 GBytes     7.54 Gbits/sec
2 sec           1.62 GBytes     6.48 Gbits/sec


As you can see, although the peak bandwidth can reach 8.71Gbps, the
performance is quite unstable(sometimes the bandwidth just gets chocked).
All the OSS node seems to stop reading data simultaneously. I tried to group
up different OSTs and turn on/off the checksum, this still happens. Does
anybody get a hint of the reason?

2. As we know, when reading data from lustre client, the data is moved from
OSS disk to its memory, and then send to the lustre client. Except for
O_DIRECT, is there any other configuration to optimize the  disk data
access, such as using sendfile, splice or fio, which can greatly expedite
the disk data access?

fio: http://freshmeat.net/projects/fio/

Any help will be greatly appreciated. Thanks!



-- 
Best regards,

-----------------------------------------------------------------------------------------------
Li, Tan
PhD Candidate & Research Assistant,
Electrical Engineering,
Stony Brook University, NY

Personal Web Site: https://sites.google.com/site/homepagelitan/Home

Email: fanqielee at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110519/bacf8679/attachment.htm>


More information about the lustre-discuss mailing list