[Lustre-discuss] Modifying Lustre network (good practices)

Olivier Hargoaa olivier.hargoaa at bull.fr
Thu May 20 09:27:31 PDT 2010


Hi Brian and all others,

I'm sorry for not giving you all details. Here I will send you all 
information I have.

Regarding our configuration :
Lustre IO nodes are linked with two 10GB bonded links.
Compute nodes are linked with two 1GB bonded links.

Raw performances on server are fine for both write and read for each ost.

Firstly we ran iperf (severals times), and we obtained expected read and 
write rate. Results are symmetric (read and write) with any number of 
threads.

Then we test with LNET self test :
Here is our lst command for write test
lst add_test --batch bulkr --from c --to s brw write  check=simple size=1M
and result are :
[LNet Rates of c]
[R] Avg: 110      RPC/s Min: 110      RPC/s Max: 110      RPC/s
[W] Avg: 219      RPC/s Min: 219      RPC/s Max: 219      RPC/s
[LNet Bandwidth of c]
[R] Avg: 0.02     MB/s  Min: 0.02     MB/s  Max: 0.02     MB/s
[W] Avg: 109.20   MB/s  Min: 109.20   MB/s  Max: 109.20   MB/s
[LNet Rates of c]
[R] Avg: 109      RPC/s Min: 109      RPC/s Max: 109      RPC/s
[W] Avg: 217      RPC/s Min: 217      RPC/s Max: 217      RPC/s
[LNet Bandwidth of c]
[R] Avg: 0.02     MB/s  Min: 0.02     MB/s  Max: 0.02     MB/s
[W] Avg: 108.40   MB/s  Min: 108.40   MB/s  Max: 108.40   MB/s
[LNet Rates of c]
[R] Avg: 109      RPC/s Min: 109      RPC/s Max: 109      RPC/s
[W] Avg: 217      RPC/s Min: 217      RPC/s Max: 217      RPC/s
[LNet Bandwidth of c]
[R] Avg: 0.02     MB/s  Min: 0.02     MB/s  Max: 0.02     MB/s
[W] Avg: 108.40   MB/s  Min: 108.40   MB/s  Max: 108.40   MB/s

and now for read :
[LNet Rates of c]
[R] Avg: 10       RPC/s Min: 10       RPC/s Max: 10       RPC/s
[W] Avg: 5        RPC/s Min: 5        RPC/s Max: 5        RPC/s
[LNet Bandwidth of c]
[R] Avg: 4.59     MB/s  Min: 4.59     MB/s  Max: 4.59     MB/s
[W] Avg: 0.00     MB/s  Min: 0.00     MB/s  Max: 0.00     MB/s
[LNet Rates of c]
[R] Avg: 10       RPC/s Min: 10       RPC/s Max: 10       RPC/s
[W] Avg: 5        RPC/s Min: 5        RPC/s Max: 5        RPC/s
[LNet Bandwidth of c]
[R] Avg: 4.79     MB/s  Min: 4.79     MB/s  Max: 4.79     MB/s
[W] Avg: 0.00     MB/s  Min: 0.00     MB/s  Max: 0.00     MB/s
[LNet Rates of c]
[R] Avg: 10       RPC/s Min: 10       RPC/s Max: 10       RPC/s
[W] Avg: 5        RPC/s Min: 5        RPC/s Max: 5        RPC/s
[LNet Bandwidth of c]
[R] Avg: 4.79     MB/s  Min: 4.79     MB/s  Max: 4.79     MB/s
[W] Avg: 0.00     MB/s  Min: 0.00     MB/s  Max: 0.00     MB/s

Iozone presents same asymmetric results as LNET.

With just one ost :
On WRITE sense, we get 233 MB/sec and taking into account maximum 
theorical is 250 MB/sec is a very good result: it works fine:

On READ sense, the maximun we get is 149 MB/sec with three theats ( 
processes: -t 3 ). If we configure four theats ( -t 4 ) we get 50 MB/sec

We also verified in brw_stats file that we use 1MB block size (both r and w)

So we only have problems with iozone/lustre and lnet selftest.

Thanks to all.


Brian J. Murrell a écrit :
> On Thu, 2010-05-20 at 16:27 +0200, Olivier Hargoaa wrote: 
>> On Lustre we get poor read performances 
>> and good write performances so we decide to modify Lustre network in 
>> order to see if problems comes from network layer.
> 
> Without having any other information other than your statement that
> "performance is good in one direction but not the other" I wonder why
> you consider the network as being the most likely candidate as a culprit
> for this problem.  I haven't come across very many networks that
> (weren't designed to be and yet) are fast in one direction and slow in
> the other.
> 
>> Therefore, we'll perform the following steps: we will umount the 
>> filesystem, reformat the mgs, change lnet options in modprobe file, 
>> start new mgs server, and finally modify our ost and mdt with 
>> tunefs.lustre with failover and mgs new nids using "--erase-params" and 
>> "--writeconf" options.
> 
> Sounds like a lot of rigmarole to test something that I would consider
> to be of low probability (given the brief amount of information you have
> provided).  But even if I did suspect the network were slow in only one
> direction, before I started mucking with reconfiguring Lustre for
> different networks, I would do some basic network throughput testing to
> verify my hypothesis and adjust the probability of the network being the
> problem accordingly.
> 
> Did you do any hardware profiling (i.e. using the lustre-iokit) before
> deploying Lustre on this hardware?  We always recommend profiling the
> hardware for exactly this reason: explaining performance problems.
> 
> Unfortunately, now that you have data on the hardware, it's much more
> difficult to profile the hardware because to do it properly, you need to
> be able to write to the disks.
> 
> b.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list