[lustre-discuss] Lnet Self Test

Jongwoo Han jongwoohan at gmail.com
Wed Dec 4 20:07:41 PST 2019


Have you tried MTU >= 9000 bytes (AKA jumbo frame) on the 25G ethernet and
the switch?
If it is set to 1500 bytes, ethernet + IP + TCP frame headers take quite
amount of packet, reducing available bandwidth for data.

Jongwoo Han

2019년 11월 28일 (목) 오전 3:44, Pinkesh Valdria <pinkesh.valdria at oracle.com>님이
작성:

> Thanks Andreas for your response.
>
>
>
> I ran anotherLnet Self test with 48 concurrent processes, since the nodes
> have 52 physical cores and I was able to achieve same throughput (2052.71
> MiB/s = 2152 MB/s).
>
>
>
> Is it expected to lose almost 600 MB/s (2750-2150= ) due to overheads on
> ethernet with Lnet?
>
>
>
>
>
> Thanks,
>
> Pinkesh Valdria
>
> Oracle Cloud Infrastructure
>
>
>
>
>
>
>
>
>
> *From: *Andreas Dilger <adilger at whamcloud.com>
> *Date: *Wednesday, November 27, 2019 at 1:25 AM
> *To: *Pinkesh Valdria <pinkesh.valdria at oracle.com>
> *Cc: *"lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> *Subject: *Re: [lustre-discuss] Lnet Self Test
>
>
>
> The first thing to note is that lst reports results in binary units
>
> (MiB/s) while iperf reports results in decimal units (Gbps).  If you do the
>
> conversion you get 2055.31 MiB/s = 2155 MB/s.
>
>
>
> The other thing to check is the CPU usage. For TCP the CPU usage can
>
> be high. You should try RoCE+o2iblnd instead.
>
>
>
> Cheers, Andreas
>
>
> On Nov 26, 2019, at 21:26, Pinkesh Valdria <pinkesh.valdria at oracle.com>
> wrote:
>
> Hello All,
>
>
>
> I created a new Lustre cluster on CentOS7.6 and I am running
> lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes
> are connected to each other using 25Gbps ethernet, so theoretical max is 25
> Gbps * 125 = 3125 MB/s.    Using iperf3,  I get 22Gbps (2750 MB/s) between
> the nodes.
>
>
>
>
>
> [root at lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ;
> ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write
> CKSUM=simple LFROM="10.0.3.7 at tcp1" LTO="10.0.3.6 at tcp1"
> /root/lnet_selftest_wrapper.sh; done ;
>
>
>
> When I run lnet_selftest_wrapper.sh (from Lustre wiki
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_LNET-5FSelftest&d=DwMGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=HpfvG0tozSl7HgJJuyxxo2149EjwqpQDE7ytv-4sZuI&m=dEosA07cQm7WPohubrpzab8agc4uFDGesC-4tI4ylm0&s=-ne2Yke64JRw4BQu9pa0DXwf3tHkDqaUbp7S6Eq_C_Q&e=>)
> between 2 nodes,  I get a max of  2055.31  MiB/s,  Is that expected at the
> Lnet level?  Or can I further tune the network and OS kernel (tuning I
> applied are below) to get better throughput?
>
>
>
>
>
>
>
> *Result Snippet from lnet_selftest_wrapper.sh*
>
>
>
> [LNet Rates of lfrom]
>
> [R] Avg: 4112     RPC/s Min: 4112     RPC/s Max: 4112     RPC/s
>
> [W] Avg: 4112     RPC/s Min: 4112     RPC/s Max: 4112     RPC/s
>
> [LNet Bandwidth of lfrom]
>
> [R] Avg: 0.31     MiB/s Min: 0.31     MiB/s Max: 0.31     MiB/s
>
> [W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s
>
> [LNet Rates of lto]
>
> [R] Avg: 4136     RPC/s Min: 4136     RPC/s Max: 4136     RPC/s
>
> [W] Avg: 4136     RPC/s Min: 4136     RPC/s Max: 4136     RPC/s
>
> [LNet Bandwidth of lto]
>
> [R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s
>
> [W] Avg: 0.32     MiB/s Min: 0.32     MiB/s Max: 0.32     MiB/s
>
>
>
>
>
> *Tuning applied: *
>
> *Ethernet NICs: *
>
> ip link set dev ens3 mtu 9000
>
> ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191
>
>
>
>
>
> *less /etc/sysctl.conf*
>
> net.core.wmem_max=16777216
>
> net.core.rmem_max=16777216
>
> net.core.wmem_default=16777216
>
> net.core.rmem_default=16777216
>
> net.core.optmem_max=16777216
>
> net.core.netdev_max_backlog=27000
>
> kernel.sysrq=1
>
> kernel.shmmax=18446744073692774399
>
> net.core.somaxconn=8192
>
> net.ipv4.tcp_adv_win_scale=2
>
> net.ipv4.tcp_low_latency=1
>
> net.ipv4.tcp_rmem = 212992 87380 16777216
>
> net.ipv4.tcp_sack = 1
>
> net.ipv4.tcp_timestamps = 1
>
> net.ipv4.tcp_window_scaling = 1
>
> net.ipv4.tcp_wmem = 212992 65536 16777216
>
> vm.min_free_kbytes = 65536
>
> net.ipv4.tcp_congestion_control = cubic
>
> net.ipv4.tcp_timestamps = 0
>
> net.ipv4.tcp_congestion_control = htcp
>
> net.ipv4.tcp_no_metrics_save = 0
>
>
>
>
>
>
>
> echo "#
>
> *# tuned configuration*
>
> *#*
>
> [main]
>
> summary=Broadly applicable tuning that provides excellent performance
> across a variety of common server workloads
>
>
>
> [disk]
>
> devices=!dm-*, !sda1, !sda2, !sda3
>
> readahead=>4096
>
>
>
> [cpu]
>
> force_latency=1
>
> governor=performance
>
> energy_perf_bias=performance
>
> min_perf_pct=100
>
> [vm]
>
> transparent_huge_pages=never
>
> [sysctl]
>
> kernel.sched_min_granularity_ns = 10000000
>
> kernel.sched_wakeup_granularity_ns = 15000000
>
> vm.dirty_ratio = 30
>
> vm.dirty_background_ratio = 10
>
> vm.swappiness=30
>
> " > lustre-performance/tuned.conf
>
>
>
> tuned-adm profile lustre-performance
>
>
>
>
>
> Thanks,
>
> Pinkesh Valdria
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwMGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=HpfvG0tozSl7HgJJuyxxo2149EjwqpQDE7ytv-4sZuI&m=dEosA07cQm7WPohubrpzab8agc4uFDGesC-4tI4ylm0&s=ejwMDqk5D3TzRE5eTzFdEKo9cQ0I6GVqN04wgaJcn0s&e=>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
Jongwoo Han
+82-505-227-6108
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191205/87aab422/attachment.html>


More information about the lustre-discuss mailing list