[lustre-discuss] Lnet Self Test
Pinkesh Valdria
pinkesh.valdria at oracle.com
Wed Dec 4 21:14:01 PST 2019
Thanks Jongwoo.
I have the MTU set for 9000 and also ring buffer setting set to max.
ip link set dev $primaryNICInterface mtu 9000
ethtool -G $primaryNICInterface rx 2047 tx 2047 rx-jumbo 8191
I read about changing Interrupt Coalesce, but unable to find what values should be changed and also if it really helps or not.
# Several packets in a rapid sequence can be coalesced into one interrupt passed up to the CPU, providing more CPU time for application processing.
Thanks,
Pinkesh valdria
Oracle Cloud
From: Jongwoo Han <jongwoohan at gmail.com>
Date: Wednesday, December 4, 2019 at 8:07 PM
To: Pinkesh Valdria <pinkesh.valdria at oracle.com>
Cc: Andreas Dilger <adilger at whamcloud.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Lnet Self Test
Have you tried MTU >= 9000 bytes (AKA jumbo frame) on the 25G ethernet and the switch?
If it is set to 1500 bytes, ethernet + IP + TCP frame headers take quite amount of packet, reducing available bandwidth for data.
Jongwoo Han
2019년 11월 28일 (목) 오전 3:44, Pinkesh Valdria <pinkesh.valdria at oracle.com>님이 작성:
Thanks Andreas for your response.
I ran anotherLnet Self test with 48 concurrent processes, since the nodes have 52 physical cores and I was able to achieve same throughput (2052.71 MiB/s = 2152 MB/s).
Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on ethernet with Lnet?
Thanks,
Pinkesh Valdria
Oracle Cloud Infrastructure
From: Andreas Dilger <adilger at whamcloud.com>
Date: Wednesday, November 27, 2019 at 1:25 AM
To: Pinkesh Valdria <pinkesh.valdria at oracle.com>
Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Lnet Self Test
The first thing to note is that lst reports results in binary units
(MiB/s) while iperf reports results in decimal units (Gbps). If you do the
conversion you get 2055.31 MiB/s = 2155 MB/s.
The other thing to check is the CPU usage. For TCP the CPU usage can
be high. You should try RoCE+o2iblnd instead.
Cheers, Andreas
On Nov 26, 2019, at 21:26, Pinkesh Valdria <pinkesh.valdria at oracle.com> wrote:
Hello All,
I created a new Lustre cluster on CentOS7.6 and I am running lnet_selftest_wrapper.sh to measure throughput on the network. The nodes are connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 125 = 3125 MB/s. Using iperf3, I get 22Gbps (2750 MB/s) between the nodes.
[root at lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ; do echo $c ; ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S) CN=$c SZ=1M TM=30 BRW=write CKSUM=simple LFROM="10.0.3.7 at tcp1" LTO="10.0.3.6 at tcp1" /root/lnet_selftest_wrapper.sh; done ;
When I run lnet_selftest_wrapper.sh (from Lustre wiki) between 2 nodes, I get a max of 2055.31 MiB/s, Is that expected at the Lnet level? Or can I further tune the network and OS kernel (tuning I applied are below) to get better throughput?
Result Snippet from lnet_selftest_wrapper.sh
[LNet Rates of lfrom]
[R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s
[W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s
[W] Avg: 2055.30 MiB/s Min: 2055.30 MiB/s Max: 2055.30 MiB/s
[LNet Rates of lto]
[R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s
[W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s
[LNet Bandwidth of lto]
[R] Avg: 2055.31 MiB/s Min: 2055.31 MiB/s Max: 2055.31 MiB/s
[W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s
Tuning applied:
Ethernet NICs:
ip link set dev ens3 mtu 9000
ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191
less /etc/sysctl.conf
net.core.wmem_max=16777216
net.core.rmem_max=16777216
net.core.wmem_default=16777216
net.core.rmem_default=16777216
net.core.optmem_max=16777216
net.core.netdev_max_backlog=27000
kernel.sysrq=1
kernel.shmmax=18446744073692774399
net.core.somaxconn=8192
net.ipv4.tcp_adv_win_scale=2
net.ipv4.tcp_low_latency=1
net.ipv4.tcp_rmem = 212992 87380 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 212992 65536 16777216
vm.min_free_kbytes = 65536
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_congestion_control = htcp
net.ipv4.tcp_no_metrics_save = 0
echo "#
# tuned configuration
#
[main]
summary=Broadly applicable tuning that provides excellent performance across a variety of common server workloads
[disk]
devices=!dm-*, !sda1, !sda2, !sda3
readahead=>4096
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
transparent_huge_pages=never
[sysctl]
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
vm.dirty_ratio = 30
vm.dirty_background_ratio = 10
vm.swappiness=30
" > lustre-performance/tuned.conf
tuned-adm profile lustre-performance
Thanks,
Pinkesh Valdria
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
--
Jongwoo Han
+82-505-227-6108
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191204/20d93102/attachment-0001.html>
More information about the lustre-discuss
mailing list