[lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
Pinkesh Valdria
pinkesh.valdria at oracle.com
Wed Dec 11 08:45:57 PST 2019
I was not able to find those parameters on my client nodes, OSS or MGS nodes. Here is how I was extracting all parameters .
mkdir -p lctl_list_param_R/
cd lctl_list_param_R/
lctl list_param -R * > lctl_list_param_R
[opc at lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R | grep ahead
llite.lfsbv-ffff98231c3bc000.statahead_agl
llite.lfsbv-ffff98231c3bc000.statahead_max
llite.lfsbv-ffff98231c3bc000.statahead_running_max
llite.lfsnvme-ffff98232c30e000.statahead_agl
llite.lfsnvme-ffff98232c30e000.statahead_max
llite.lfsnvme-ffff98232c30e000.statahead_running_max
[opc at lustre-client-1 lctl_list_param_R]$
I also tried these commands:
Not working:
On client nodes
lctl get_param llite.lfsbv-*.max_read_ahead_mb
error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or directory
[opc at lustre-client-1 lctl_list_param_R]$
Works
On client nodes
lctl get_param llite.*.statahead_agl
llite.lfsbv-ffff98231c3bc000.statahead_agl=1
llite.lfsnvme-ffff98232c30e000.statahead_agl=1
[opc at lustre-client-1 lctl_list_param_R]$
From: "Moreno Diego (ID SIS)" <diego.moreno at id.ethz.ch>
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria <pinkesh.valdria at oracle.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
With that kind of degradation performance on read I would immediately think on llite’s max_read_ahead parameters on the client. Specifically these 2:
max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite low for bandwidth benchmarking purposes and when there’re several files per client
max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few RPCs per file)
You probably need to check the effect increasing both of them.
Regards,
Diego
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Pinkesh Valdria <pinkesh.valdria at oracle.com>
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)
I was expecting better or same read performance with Large Bulk IO (16MB RPC), but I see degradation in performance. Do I need to tune any other parameter to benefit from Large Bulk IO? Appreciate if I can get any pointers to troubleshoot further.
Throughput before
- Read: 2563 MB/s
- Write: 2585 MB/s
Throughput after
- Read: 1527 MB/s. (down by ~1025)
- Write: 2859 MB/s
Changes I did are:
On oss
- lctl set_param obdfilter.lfsbv-*.brw_size=16
On clients
- unmounted and remounted
- lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096 (got auto-updated after re-mount)
- lctl set_param osc.*.max_rpcs_in_flight=64 (Had to manually increase this to 64, since after re-mount, it was auto-set to 8, but read/write performance was poor)
- lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 was failing with : Numerical result out of range error. Previously it was set to 2000 when I got good performance.
My other settings:
- lnetctl net add --net tcp1 --if $interface –peer-timeout 180 –peer-credits 128 –credits 1024
- echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 peer_credits=63 enable_irq_affinity=0" > /etc/modprobe.d/ksocklnd.conf
- lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191211/d40eaca2/attachment-0001.html>
More information about the lustre-discuss
mailing list