[lustre-discuss] Very bad lnet ethernet read performance

Louis Bailleul Louis.Bailleul at pgs.com
Mon Aug 12 10:07:51 PDT 2019


Hi all,

I am trying to understand what I am doing wrong here.
I have a Lustre 2.12.1 system backed by NVME drives under zfs for which obdfilter-survey gives descent values
ost  2 sz 536870912K rsz 1024K obj    2 thr  256 write 15267.49 [6580.36, 8664.20] rewrite 15225.24 [6559.05, 8900.54] read 19739.86 [9062.25, 10429.04]
But my actual Lustre performances are pretty poor in comparison (can't top 8GB/s write and 13.5GB/s read)
So I started to question my lnet tuning but playing with peer_credits and max_rpc_per_pages didn't help.

My test setup consist of 133x10G Ethernet clients (uplinks between end devices and OSS are 2x100G for every 20 nodes).
The single OSS is fitted with a bonding of 2x100G Ethernet.

I have tried to understand the problem using lnet_selftest but I'll need some help/doco as this doesn't make sense to me.

Testing a single 10G client
[LNet Rates of lfrom]
[R] Avg: 2231     RPC/s Min: 2231     RPC/s Max: 2231     RPC/s
[W] Avg: 1156     RPC/s Min: 1156     RPC/s Max: 1156     RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 1075.16  MiB/s Min: 1075.16  MiB/s Max: 1075.16  MiB/s
[W] Avg: 0.18     MiB/s Min: 0.18     MiB/s Max: 0.18     MiB/s
[LNet Rates of lto]
[R] Avg: 1179     RPC/s Min: 1179     RPC/s Max: 1179     RPC/s
[W] Avg: 2254     RPC/s Min: 2254     RPC/s Max: 2254     RPC/s
[LNet Bandwidth of lto]
[R] Avg: 0.19     MiB/s Min: 0.19     MiB/s Max: 0.19     MiB/s
[W] Avg: 1075.17  MiB/s Min: 1075.17  MiB/s Max: 1075.17  MiB/s
With 10x10G clients :
[LNet Rates of lfrom]
[R] Avg: 1416     RPC/s Min: 1102     RPC/s Max: 1642     RPC/s
[W] Avg: 708      RPC/s Min: 551      RPC/s Max: 821      RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 708.20   MiB/s Min: 550.77   MiB/s Max: 820.96   MiB/s
[W] Avg: 0.11     MiB/s Min: 0.08     MiB/s Max: 0.13     MiB/s
[LNet Rates of lto]
[R] Avg: 7084     RPC/s Min: 7084     RPC/s Max: 7084     RPC/s
[W] Avg: 14165    RPC/s Min: 14165    RPC/s Max: 14165    RPC/s
[LNet Bandwidth of lto]
[R] Avg: 1.08     MiB/s Min: 1.08     MiB/s Max: 1.08     MiB/s
[W] Avg: 7081.86  MiB/s Min: 7081.86  MiB/s Max: 7081.86  MiB/s

With all 133x10G clients:
[LNet Rates of lfrom]
[R] Avg: 510      RPC/s Min: 98       RPC/s Max: 23457    RPC/s
[W] Avg: 510      RPC/s Min: 49       RPC/s Max: 45863    RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 169.87   MiB/s Min: 48.77    MiB/s Max: 341.26   MiB/s
[W] Avg: 169.86   MiB/s Min: 0.01     MiB/s Max: 22757.92 MiB/s
[LNet Rates of lto]
[R] Avg: 23458    RPC/s Min: 23458    RPC/s Max: 23458    RPC/s
[W] Avg: 45876    RPC/s Min: 45876    RPC/s Max: 45876    RPC/s
[LNet Bandwidth of lto]
[R] Avg: 341.12   MiB/s Min: 341.12   MiB/s Max: 341.12   MiB/s
[W] Avg: 22758.42 MiB/s Min: 22758.42 MiB/s Max: 22758.42 MiB/s

So if I add clients the aggregate write bandwidth somewhat stacks, but the read bandwidth decrease ???
When throwing all the nodes at the system, I am pretty happy with the ~22GB/s on write pretty as this is in the 90% of the 2x100G, but the 341MB/s read sounds very weird considering that this is a third of the performance of a single client.

This are my ksocklnd tuning :

# for i in /sys/module/ksocklnd/parameters/*; do echo "$i : $(cat $i)"; done
/sys/module/ksocklnd/parameters/credits : 1024
/sys/module/ksocklnd/parameters/eager_ack : 0
/sys/module/ksocklnd/parameters/enable_csum : 0
/sys/module/ksocklnd/parameters/enable_irq_affinity : 0
/sys/module/ksocklnd/parameters/inject_csum_error : 0
/sys/module/ksocklnd/parameters/keepalive : 30
/sys/module/ksocklnd/parameters/keepalive_count : 5
/sys/module/ksocklnd/parameters/keepalive_idle : 30
/sys/module/ksocklnd/parameters/keepalive_intvl : 5
/sys/module/ksocklnd/parameters/max_reconnectms : 60000
/sys/module/ksocklnd/parameters/min_bulk : 1024
/sys/module/ksocklnd/parameters/min_reconnectms : 1000
/sys/module/ksocklnd/parameters/nagle : 0
/sys/module/ksocklnd/parameters/nconnds : 4
/sys/module/ksocklnd/parameters/nconnds_max : 64
/sys/module/ksocklnd/parameters/nonblk_zcack : 1
/sys/module/ksocklnd/parameters/nscheds : 12
/sys/module/ksocklnd/parameters/peer_buffer_credits : 0
/sys/module/ksocklnd/parameters/peer_credits : 128
/sys/module/ksocklnd/parameters/peer_timeout : 180
/sys/module/ksocklnd/parameters/round_robin : 1
/sys/module/ksocklnd/parameters/rx_buffer_size : 0
/sys/module/ksocklnd/parameters/sock_timeout : 50
/sys/module/ksocklnd/parameters/tx_buffer_size : 0
/sys/module/ksocklnd/parameters/typed_conns : 1
/sys/module/ksocklnd/parameters/zc_min_payload : 16384
/sys/module/ksocklnd/parameters/zc_recv : 0
/sys/module/ksocklnd/parameters/zc_recv_min_nfrags : 16

Best regards,
Louis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190812/11c54b9f/attachment.html>


More information about the lustre-discuss mailing list