<div>Dear all, </div><div><br></div><div>I have two question regarding the performance of Lustre System. Currently, we have 5 OSS nodes, and each OSS carries 8 OST's. All the nodes (including the MDT/MGS node and client node) are connected to a Mellanox MTS 3600 InfiniBand switch using RDMA for data transfer. The bandwidth of the network is 40Gbps. The kernel version is 'Linux 2.6.18-164.11.1.el5_lustre.1.8.3 #1 SMP Fri Apr 9 18:00:39 MDT 2010 x86_64 x86_64 x86_64 GNU/Linux'. OS is RHEL 5.5. Lustre version is 1.8.3. OFED Version is 1.5.2. IB HCA is Mellanox Technologies MT26428 ConnectX VPI PCIe IB QDR.</div>
<div><br></div><div>And I did a simple test on the client side to see the peak data reading performance. Here is the data:</div><div><br></div><div>#time Data transferred Bandwidth</div><div><div>2 sec 2.18 GBytes 8.71 Gbits/sec</div>
<div>2 sec 2.06 GBytes 8.24 Gbits/sec</div><div>2 sec 2.10 GBytes 8.40 Gbits/sec</div><div>2 sec 1.93 GBytes 7.73 Gbits/sec</div><div>2 sec 1.50 GBytes 6.02 Gbits/sec</div>
<div>2 sec 420.00 MBytes 1.64 Gbits/sec</div><div>2 sec 2.19 GBytes 8.75 Gbits/sec</div><div>2 sec 2.08 GBytes 8.32 Gbits/sec</div><div>2 sec 2.08 GBytes 8.32 Gbits/sec</div>
<div>2 sec 1.99 GBytes 7.97 Gbits/sec</div><div>2 sec 1.80 GBytes 7.19 Gbits/sec</div><div><b>2 sec 160.00 MBytes 640.00 Mbits/sec</b></div><div>2 sec 2.15 GBytes 8.59 Gbits/sec</div>
<div>2 sec 2.13 GBytes 8.52 Gbits/sec</div><div>2 sec 2.15 GBytes 8.59 Gbits/sec</div><div>2 sec 2.09 GBytes 8.36 Gbits/sec</div><div>2 sec 2.09 GBytes 8.36 Gbits/sec</div>
<div>2 sec 2.07 GBytes 8.28 Gbits/sec</div><div>2 sec 2.15 GBytes 8.59 Gbits/sec</div><div>2 sec 2.11 GBytes 8.44 Gbits/sec</div><div>2 sec 2.05 GBytes 8.20 Gbits/sec</div>
<div><b>2 sec 0.00 Bytes 0.00 bits/sec</b></div><div><b>2 sec 0.00 Bytes 0.00 bits/sec</b></div><div>2 sec 1.95 GBytes 7.81 Gbits/sec</div><div>2 sec 2.14 GBytes 8.55 Gbits/sec</div>
<div>2 sec 1.99 GBytes 7.97 Gbits/sec</div><div>2 sec 2.00 GBytes 8.01 Gbits/sec</div><div>2 sec 370.00 MBytes 1.45 Gbits/sec</div><div>2 sec 1.96 GBytes 7.85 Gbits/sec</div>
<div>2 sec 2.03 GBytes 8.12 Gbits/sec</div><div>2 sec 1.89 GBytes 7.58 Gbits/sec</div><div>2 sec 1.94 GBytes 7.77 Gbits/sec</div><div>2 sec 640.00 MBytes 2.50 Gbits/sec</div>
<div>2 sec 1.47 GBytes 5.90 Gbits/sec</div><div>2 sec 1.94 GBytes 7.77 Gbits/sec</div><div>2 sec 1.90 GBytes 7.62 Gbits/sec</div><div>2 sec 1.94 GBytes 7.77 Gbits/sec</div>
<div>2 sec 1.18 GBytes 4.73 Gbits/sec</div><div>2 sec 940.00 MBytes 3.67 Gbits/sec</div><div>2 sec 1.97 GBytes 7.89 Gbits/sec</div><div>2 sec 1.93 GBytes 7.73 Gbits/sec</div>
<div>2 sec 1.87 GBytes 7.46 Gbits/sec</div><div>2 sec 1.77 GBytes 7.07 Gbits/sec</div><div>2 sec 320.00 MBytes 1.25 Gbits/sec</div><div>2 sec 1.97 GBytes 7.89 Gbits/sec</div>
<div>2 sec 2.00 GBytes 8.01 Gbits/sec</div><div>2 sec 1.89 GBytes 7.58 Gbits/sec</div><div>2 sec 1.93 GBytes 7.73 Gbits/sec</div><div>2 sec 350.00 MBytes 1.37 Gbits/sec</div>
<div>2 sec 1.77 GBytes 7.07 Gbits/sec</div><div>2 sec 1.92 GBytes 7.70 Gbits/sec</div><div>2 sec 2.05 GBytes 8.20 Gbits/sec</div><div>2 sec 2.01 GBytes 8.05 Gbits/sec</div>
<div>2 sec 710.00 MBytes 2.77 Gbits/sec</div><div>2 sec 1.59 GBytes 6.37 Gbits/sec</div><div>2 sec 2.00 GBytes 8.01 Gbits/sec</div></div><div><div>2 sec 710.00 MBytes 2.77 Gbits/sec</div>
<div>2 sec 1.59 GBytes 6.37 Gbits/sec</div><div>2 sec 2.00 GBytes 8.01 Gbits/sec</div><div>2 sec 1.88 GBytes 7.54 Gbits/sec</div><div>2 sec 1.62 GBytes 6.48 Gbits/sec</div>
</div><div><br></div><div><br></div><div>As you can see, although the peak bandwidth can reach 8.71Gbps, the performance is quite unstable(sometimes the bandwidth just gets chocked). All the OSS node seems to stop reading data simultaneously. I tried to group up different OSTs and turn on/off the checksum, this still happens. Does anybody get a hint of the reason?</div>
<div><br></div><div>2. As we know, when reading data from lustre client, the data is moved from OSS disk to its memory, and then send to the lustre client. Except for O_DIRECT, is there any other configuration to optimize the disk data access, such as using sendfile, splice or fio, which can greatly expedite the disk data access?</div>
<div><br></div><div>fio: <a href="http://freshmeat.net/projects/fio/">http://freshmeat.net/projects/fio/</a></div><div><br></div><div>Any help will be greatly appreciated. Thanks!</div><div><br></div><div><br></div><div><br>
</div><div>-- </div><div>Best regards,</div>
<div> </div>
<div>-----------------------------------------------------------------------------------------------<br>Li, Tan<br>PhD Candidate & Research Assistant, <br>Electrical Engineering, <br>Stony Brook University, NY<br><br>
Personal Web Site: <a href="https://sites.google.com/site/homepagelitan/Home" target="_blank">https://sites.google.com/site/homepagelitan/Home</a><br><br>Email: <a href="mailto:fanqielee@gmail.com" target="_blank">fanqielee@gmail.com</a></div>
<br>