[Lustre-discuss] [HPDD-discuss] Same performance Infiniband and Ethernet

Tue May 20 23:32:54 PDT 2014

Thanks Richard, I appreciate your advice.

I was able to sature the channel using: XDD, 10 threads writing in 10 OST, each OST in difference OSS and this is the result:

ETHERNET
                                   T  Q       Bytes             Ops      Time         Rate       IOPS      Latency   %CPU
TARGET   Average     0  1    2147483648    65536   140.156    15.322     467.59    0.0021    39.16
TARGET   Average     1  1    2147483648    65536   140.785    15.254     465.50    0.0021    39.11
TARGET   Average     2  1    2147483648    65536   140.559    15.278     466.25    0.0021    39.14
TARGET   Average     3  1    2147483648    65536   176.141    12.192     372.07    0.0027    38.02
TARGET   Average     4  1    2147483648    65536   168.234    12.765     389.55    0.0026    38.54
TARGET   Average     5  1    2147483648    65536   140.823    15.250     465.38    0.0021    39.11
TARGET   Average     6  1    2147483648    65536   140.183    15.319     467.50    0.0021    39.16
TARGET   Average     8  1    2147483648    65536   176.432    12.172     371.45    0.0027    38.02
TARGET   Average     9  1    2147483648    65536   167.944    12.787     390.23    0.0026    38.57
         Combined   10 10   21474836480   655360   180.000   119.305     3640.89    0.0003    387.99

INFINIBAND
                                   T  Q       Bytes             Ops      Time         Rate       IOPS      Latency   %CPU
TARGET   Average     0  1    2147483648    65536     9.369   229.217     6995.16    0.0001    480.40
TARGET   Average     1  1    2147483648    65536     9.540   225.110     6869.80    0.0001    474.25
TARGET   Average     2  1    2147483648    65536     8.963   239.582     7311.45    0.0001    479.85
TARGET   Average     3  1    2147483648    65536     9.480   226.521     6912.86    0.0001    478.21
TARGET   Average     4  1    2147483648    65536     9.109   235.748     7194.47    0.0001    480.83
TARGET   Average     5  1    2147483648    65536     9.284   231.299     7058.69    0.0001    479.04
TARGET   Average     6  1    2147483648    65536     8.839   242.947     7414.15    0.0001    480.55
TARGET   Average     7  1    2147483648    65536     9.210   233.166     7115.65    0.0001    480.17
TARGET   Average     8  1    2147483648    65536     9.373   229.125     6992.33    0.0001    475.13
TARGET   Average     9  1    2147483648    65536     9.184   233.828     7135.86    0.0001    480.25
         Combined   10 10   21474836480   655360     9.540   2251.097     68698.03    0.0000    4788.69

A estimate is 0,6Gbits (max 1Gbit) by ethernet and 16Gbits by infiniband (max 40Gbits).

REGARDS!

El 19/05/2014, a las 17:37, Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> escribió:

> Alfonso,
> 
> Based on my attempts to benchmark single client Lustre performance, here is some advice/comments that I have.  (YMMV)
> 
> 1) On the IB client, I recommend disabling checksums (lctl set_param osc.*.checksums=0).  Having checksums enabled sometimes results in a significant performance hit.
> 
> 2) Single-threaded tests (like dd) will usually bottleneck before you can max out the total client performance.  You need to use a multi-threaded tool (like xdd) and have several threads perform IO at the same time in order to measure aggregate single client performance.
> 
> 3) When using a tool like xdd, set up the test to run for a fixed amount of time rather than having each thread write a fixed amount of data.  If all threads write a fixed amount of data (say 1 GB), and if any of the threads run slower than others, you might get skewed results for the aggregate throughput because of the stragglers.
> 
> 4) In order to avoid contention at the ost level among the multiple threads on a single client, precreate the output files with stripe_count=1 and statically assign them evenly to the different osts.  Have each thread write to a different file so that no two processes write to the same ost.  If you don't have enough osts to saturate the client, you can always have two files per ost.  Going beyond that will likely hurt more than help, at least for an ldiskfs backend.
> 
> 5) In my testing, I seem to get worse results using direct I/O for write tests,  so I usually just use buffered I/O.  Based on my understanding, the max_dirty_mb parameter on the client (which defaults to 32 MB) limits the amount of dirty written data than can be cached on each ost.  Unless you have increased this to a very large number, that parameter will likely mitigate any effects of client caching on the test results.  (NOTE: This reasoning only applies to write tests.  Any written data can still be cached by the client, and a subsequent read test might very well pull data from cache unless you have taken steps to flush the cached data.)
> 
> If you have 10 oss nodes and 20 osts in your file system, I would start by running a test with 10 threads and have each thread write to a single ost on different servers.  You can increase/decrease the number of threads as needed to see if the aggregate performance gets better/worse.  On my clients with QDR IB, I typically see aggregate write speeds in the range of 2.5-3.0 GB/s.
> 
> You are probably already aware of this, but just in case, make sure that the IB clients you use for testing don't also have ethernet connections to your OSS servers.  If the client has an ethernet and an IB path to the same server, it will choose one of the paths to use.  It could end up choosing ethernet instead of IB and mess up your results.
> 
> -- 
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 
> 
> On May 19, 2014, at 6:33 AM, "Pardo Diaz, Alfonso" <alfonso.pardo at ciemat.es>
> wrote:
> 
>> Hi,
>> 
>> I have migrated my Lustre 2.2 to 2.5.1 and I have equipped my OSS/MDS and clients with Infiniband QDR interfaces.
>> I have compile lustre with OFED 3.2 and I have configured lnet module with:
>> 
>> options lent networks=“o2ib(ib0),tcp(eth0)”
>> 
>> 
>> But when I try to compare the lustre performance across Infiniband (o2ib), I get the same performance than across ethernet (tcp):
>> 
>> INFINIBAND TEST:
>> dd if=/dev/zero of=test.dat bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1,0 GB) copied, 5,88433 s, 178 MB/s
>> 
>> ETHERNET TEST:
>> dd if=/dev/zero of=test.dat bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1,0 GB) copied, 5,97423 s, 154 MB/s
>> 
>> 
>> And this is my scenario:
>> 
>> - 1 MDs with SSD RAID10 MDT
>> - 10 OSS with 2 OST per OSS
>> - Infiniband interface in connected mode
>> - Centos 6.5
>> - Lustre 2.5.1
>> - Striped filesystem “lfs setstripe -s 1M -c 10"
>> 
>> 
>> I know my infiniband running correctly, because if I use IPERF3 between client and servers I got 40Gb/s by infiniband and 1Gb/s by ethernet connections.
>> 
>> 
>> 
>> Could you help me?
>> 
>> 
>> Regards,
>> 
>> 
>> 
>> 
>> 
>> Alfonso Pardo Diaz
>> System Administrator / Researcher
>> c/ Sola nº 1; 10200 Trujillo, ESPAÑA
>> Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
>> 
>> 
>> 
>> 
>> ----------------------------
>> Confidencialidad: 
>> Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucción.
>> 
>> Disclaimer: 
>> This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately. 
>> ----------------------------
>> 
>> _______________________________________________
>> HPDD-discuss mailing list
>> HPDD-discuss at lists.01.org
>> https://lists.01.org/mailman/listinfo/hpdd-discuss
> 
> 
>