[lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
Pinkesh Valdria
pinkesh.valdria at oracle.com
Wed Jan 22 00:41:04 PST 2020
To close the loop on this topic.
The below parameters were not set by default and hence they were not showing up in lctl list_param commands. I have to set them first.
lctl set_param llite.*.max_read_ahead_mb=256
lctl set_param llite.*.max_read_ahead_per_file_mb=256
Thanks to the Lustre Community for their help to tune Lustre, I was able to tune Lustre on Oracle Cloud Infrastructure to get good performance on Bare metal nodes with 2x25Gbps network. We have open sourced the deployment of Lustre on Oracle Cloud as well as all the performance tuning done at the Infrastructure level as well as Lustre FS level for everyone to benefit from it.
https://github.com/oracle-quickstart/oci-lustre
Terraform files are in : https://github.com/oracle-quickstart/oci-lustre/tree/master/terraform
Tuning scripts are in this folder: https://github.com/oracle-quickstart/oci-lustre/tree/master/scripts
As next step - I plan to test deployment of Lustre on 100 Gbps RoCEv2 RDMA network (Mellanox CX5).
Thanks,
Pinkesh Valdria
Oracle Cloud – Principal Solutions Architect
https://blogs.oracle.com/cloud-infrastructure/lustre-file-system-performance-on-oracle-cloud-infrastructure
https://blogs.oracle.com/author/pinkesh-valdria
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Pinkesh Valdria <pinkesh.valdria at oracle.com>
Date: Friday, December 13, 2019 at 11:14 AM
To: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
I ran the latest command you provided and it does not show the parameter, like you see. I can do screenshare.
[opc at lustre-client-1 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 39G 2.5G 36G 7% /
devtmpfs 158G 0 158G 0% /dev
tmpfs 158G 0 158G 0% /dev/shm
tmpfs 158G 17M 158G 1% /run
tmpfs 158G 0 158G 0% /sys/fs/cgroup
/dev/sda1 512M 12M 501M 3% /boot/efi
10.0.3.6 at tcp1:/lfsbv 50T 89M 48T 1% /mnt/mdt_bv
10.0.3.6 at tcp1:/lfsnvme 185T 8.7M 176T 1% /mnt/mdt_nvme
tmpfs 32G 0 32G 0% /run/user/1000
[opc at lustre-client-1 ~]$ lctl list_param -R llite | grep max_read_ahead
[opc at lustre-client-1 ~]$
So I ran this:
[opc at lustre-client-1 ~]$ lctl list_param -R llite > llite_parameters.txt
There are other parameters under llite. I attached the complete list.
From: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>
Date: Friday, December 13, 2019 at 8:36 AM
To: Pinkesh Valdria <pinkesh.valdria at oracle.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
>From what I can see I think you just ran the wrong command (lctl list_param -R * ) or it doesn’t work as you expected on 2.12.3.
But llite params are sure there on a *mounted* Lustre client.
This will give you the parameters you’re looking for and need to modify to have, likely, better read performance:
lctl list_param -R llite | grep max_read_ahead
From: Pinkesh Valdria <pinkesh.valdria at oracle.com>
Date: Friday, 13 December 2019 at 17:33
To: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
This is how I installed lustre clients (only showing packages installed steps).
cat > /etc/yum.repos.d/lustre.repo << EOF
[hpddLustreserver]
name=CentOS- - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server/
gpgcheck=0
[e2fsprogs]
name=CentOS- - Ldiskfs
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/
gpgcheck=0
[hpddLustreclient]
name=CentOS- - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client/
gpgcheck=0
EOF
yum install lustre-client -y
reboot
From: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>
Date: Friday, December 13, 2019 at 2:55 AM
To: Pinkesh Valdria <pinkesh.valdria at oracle.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
>From what I can see they exist on my 2.12.3 client node:
[root at rufus4 ~]# lctl list_param -R llite | grep max_read_ahead
llite.reprofs-ffff9f7c3b4a8800.max_read_ahead_mb
llite.reprofs-ffff9f7c3b4a8800.max_read_ahead_per_file_mb
llite.reprofs-ffff9f7c3b4a8800.max_read_ahead_whole_mb
Regards,
Diego
From: Pinkesh Valdria <pinkesh.valdria at oracle.com>
Date: Wednesday, 11 December 2019 at 17:46
To: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
I was not able to find those parameters on my client nodes, OSS or MGS nodes. Here is how I was extracting all parameters .
mkdir -p lctl_list_param_R/
cd lctl_list_param_R/
lctl list_param -R * > lctl_list_param_R
[opc at lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R | grep ahead
llite.lfsbv-ffff98231c3bc000.statahead_agl
llite.lfsbv-ffff98231c3bc000.statahead_max
llite.lfsbv-ffff98231c3bc000.statahead_running_max
llite.lfsnvme-ffff98232c30e000.statahead_agl
llite.lfsnvme-ffff98232c30e000.statahead_max
llite.lfsnvme-ffff98232c30e000.statahead_running_max
[opc at lustre-client-1 lctl_list_param_R]$
I also tried these commands:
Not working:
On client nodes
lctl get_param llite.lfsbv-*.max_read_ahead_mb
error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or directory
[opc at lustre-client-1 lctl_list_param_R]$
Works
On client nodes
lctl get_param llite.*.statahead_agl
llite.lfsbv-ffff98231c3bc000.statahead_agl=1
llite.lfsnvme-ffff98232c30e000.statahead_agl=1
[opc at lustre-client-1 lctl_list_param_R]$
From: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria <pinkesh.valdria at oracle.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
With that kind of degradation performance on read I would immediately think on llite’s max_read_ahead parameters on the client. Specifically these 2:
max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite low for bandwidth benchmarking purposes and when there’re several files per client
max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few RPCs per file)
You probably need to check the effect increasing both of them.
Regards,
Diego
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Pinkesh Valdria <pinkesh.valdria at oracle.com>
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
I was expecting better or same read performance with Large Bulk IO (16MB RPC), but I see degradation in performance. Do I need to tune any other parameter to benefit from Large Bulk IO? Appreciate if I can get any pointers to troubleshoot further.
Throughput before
- Read: 2563 MB/s
- Write: 2585 MB/s
Throughput after
- Read: 1527 MB/s. (down by ~1025)
- Write: 2859 MB/s
Changes I did are:
On oss
- lctl set_param obdfilter.lfsbv-*.brw_size=16
On clients
- unmounted and remounted
- lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096 (got auto-updated after re-mount)
- lctl set_param osc.*.max_rpcs_in_flight=64 (Had to manually increase this to 64, since after re-mount, it was auto-set to 8, but read/write performance was poor)
- lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 was failing with : Numerical result out of range error. Previously it was set to 2000 when I got good performance.
My other settings:
- lnetctl net add --net tcp1 --if $interface –peer-timeout 180 –peer-credits 128 –credits 1024
- echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 peer_credits=63 enable_irq_affinity=0" > /etc/modprobe.d/ksocklnd.conf
- lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1
_______________________________________________ lustre-discuss mailing list lustre-discuss at lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=HpfvG0tozSl7HgJJuyxxo2149EjwqpQDE7ytv-4sZuI&m=L9-PfWwM64mRVngdHVwvQCSft2nKU4YiEPCY5x9SDXg&s=dFLf774wg7-h2jkpXEhiPWGgLtbRtb9r9hL-oH-APec&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200122/04427e82/attachment-0001.html>
More information about the lustre-discuss
mailing list