[Lustre-discuss] Two questions about the tuning of Lustre file system.

Kevin Van Maren kevin.van.maren at oracle.com
Fri May 20 08:23:37 PDT 2011


What exactly were you testing?  I have no idea how to interpret your 
numbers.  A single client reading from a single file?  One file per OST, 
or file striped across all OSTs?  Is the Lustre file system idle except 
for your test?

In general, start with the pieces:
1) make sure the network is sane.  Try measuring BW to/from each node 
(client and server) to ensure all the cables are good.  For your 
configuration, you should be able to measure ~3.2GB/s (unidirectional) 
using large MPI messages.  While I prefer to use MPI, some people use 
the lnet_selftest.
2) make sure each OST is sane.  For each OST, create a file that is only 
striped on that OST.  Make sure a client can read/write each of these 
files as expected.  Be sure you transfer much more data than the 
client+server RAM sizes.

Many issues are sorted out just getting both 1 & 2 in good shape.

Kevin



Tanin wrote:
> Dear all, 
>
> I have two question regarding the performance of Lustre System. 
> Currently, we have 5 OSS nodes, and each OSS carries 8 OST's. All the 
> nodes (including the MDT/MGS node and client node) are connected to a 
> Mellanox MTS 3600 InfiniBand switch using RDMA for data transfer. The 
> bandwidth of the network is 40Gbps. The kernel version is  'Linux 
> 2.6.18-164.11.1.el5_lustre.1.8.3 #1 SMP Fri Apr 9 18:00:39 MDT 2010 
> x86_64 x86_64 x86_64 GNU/Linux'. OS is RHEL 5.5.  Lustre version is 
> 1.8.3. OFED Version is 1.5.2. IB HCA is Mellanox Technologies MT26428 
> ConnectX VPI PCIe IB QDR.
>
> And I did a simple test on the client side to see the peak data 
> reading performance. Here is the data:
>
> #time          Data transferred  Bandwidth
> 2 sec           2.18 GBytes     8.71 Gbits/sec
> 2 sec           2.06 GBytes     8.24 Gbits/sec
> 2 sec           2.10 GBytes     8.40 Gbits/sec
> 2 sec           1.93 GBytes     7.73 Gbits/sec
> 2 sec           1.50 GBytes     6.02 Gbits/sec
> 2 sec           420.00 MBytes   1.64 Gbits/sec
> 2 sec           2.19 GBytes     8.75 Gbits/sec
> 2 sec           2.08 GBytes     8.32 Gbits/sec
> 2 sec           2.08 GBytes     8.32 Gbits/sec
> 2 sec           1.99 GBytes     7.97 Gbits/sec
> 2 sec           1.80 GBytes     7.19 Gbits/sec
> *2 sec           160.00 MBytes   640.00 Mbits/sec*
> 2 sec           2.15 GBytes     8.59 Gbits/sec
> 2 sec           2.13 GBytes     8.52 Gbits/sec
> 2 sec           2.15 GBytes     8.59 Gbits/sec
> 2 sec           2.09 GBytes     8.36 Gbits/sec
> 2 sec           2.09 GBytes     8.36 Gbits/sec
> 2 sec           2.07 GBytes     8.28 Gbits/sec
> 2 sec           2.15 GBytes     8.59 Gbits/sec
> 2 sec           2.11 GBytes     8.44 Gbits/sec
> 2 sec           2.05 GBytes     8.20 Gbits/sec
> *2 sec           0.00 Bytes      0.00 bits/sec*
> *2 sec           0.00 Bytes      0.00 bits/sec*
> 2 sec           1.95 GBytes     7.81 Gbits/sec
> 2 sec           2.14 GBytes     8.55 Gbits/sec
> 2 sec           1.99 GBytes     7.97 Gbits/sec
> 2 sec           2.00 GBytes     8.01 Gbits/sec
> 2 sec           370.00 MBytes   1.45 Gbits/sec
> 2 sec           1.96 GBytes     7.85 Gbits/sec
> 2 sec           2.03 GBytes     8.12 Gbits/sec
> 2 sec           1.89 GBytes     7.58 Gbits/sec
> 2 sec           1.94 GBytes     7.77 Gbits/sec
> 2 sec           640.00 MBytes   2.50 Gbits/sec
> 2 sec           1.47 GBytes     5.90 Gbits/sec
> 2 sec           1.94 GBytes     7.77 Gbits/sec
> 2 sec           1.90 GBytes     7.62 Gbits/sec
> 2 sec           1.94 GBytes     7.77 Gbits/sec
> 2 sec           1.18 GBytes     4.73 Gbits/sec
> 2 sec           940.00 MBytes   3.67 Gbits/sec
> 2 sec           1.97 GBytes     7.89 Gbits/sec
> 2 sec           1.93 GBytes     7.73 Gbits/sec
> 2 sec           1.87 GBytes     7.46 Gbits/sec
> 2 sec           1.77 GBytes     7.07 Gbits/sec
> 2 sec           320.00 MBytes   1.25 Gbits/sec
> 2 sec           1.97 GBytes     7.89 Gbits/sec
> 2 sec           2.00 GBytes     8.01 Gbits/sec
> 2 sec           1.89 GBytes     7.58 Gbits/sec
> 2 sec           1.93 GBytes     7.73 Gbits/sec
> 2 sec           350.00 MBytes   1.37 Gbits/sec
> 2 sec           1.77 GBytes     7.07 Gbits/sec
> 2 sec           1.92 GBytes     7.70 Gbits/sec
> 2 sec           2.05 GBytes     8.20 Gbits/sec
> 2 sec           2.01 GBytes     8.05 Gbits/sec
> 2 sec           710.00 MBytes   2.77 Gbits/sec
> 2 sec           1.59 GBytes     6.37 Gbits/sec
> 2 sec           2.00 GBytes     8.01 Gbits/sec
> 2 sec           710.00 MBytes   2.77 Gbits/sec
> 2 sec           1.59 GBytes     6.37 Gbits/sec
> 2 sec           2.00 GBytes     8.01 Gbits/sec
> 2 sec           1.88 GBytes     7.54 Gbits/sec
> 2 sec           1.62 GBytes     6.48 Gbits/sec
>
>
> As you can see, although the peak bandwidth can reach 8.71Gbps, the 
> performance is quite unstable(sometimes the bandwidth just gets 
> chocked). All the OSS node seems to stop reading data simultaneously. 
> I tried to group up different OSTs and turn on/off the checksum, this 
> still happens. Does anybody get a hint of the reason?
>
> 2. As we know, when reading data from lustre client, the data is moved 
> from OSS disk to its memory, and then send to the lustre client. 
> Except for O_DIRECT, is there any other configuration to optimize the 
>  disk data access, such as using sendfile, splice or fio, which can 
> greatly expedite the disk data access?
>
> fio: http://freshmeat.net/projects/fio/
>
> Any help will be greatly appreciated. Thanks!
>
>
>
> -- 
> Best regards,
>  
> -----------------------------------------------------------------------------------------------
> Li, Tan
> PhD Candidate & Research Assistant,
> Electrical Engineering,
> Stony Brook University, NY
>
> Personal Web Site: https://sites.google.com/site/homepagelitan/Home
>
> Email: fanqielee at gmail.com <mailto:fanqielee at gmail.com>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list