[lustre-discuss] Very low write bandwidth
evancervj
evancervj at cdac.in
Tue Apr 15 23:41:00 PDT 2025
Hi,
I have been working on benchmarking Lustre with IOR on a 4-node cluster and have
encountered an issue, where the observed write bandwidth is significantly lower
than read bandwidth. Below are the setup details for the cluster:
1. 1 MGS/MDS node with :
1. Linux Kernel 4.18.0-513.9.1.el8_lustre.x86_64
2. 800 GB nvme disk formatted as LDISKFS
3. Lustre server v2.15.4
2. 2 OSS nodes with 1 OST on each node with :
1. Linux Kernel 4.18.0-513.9.1.el8_lustre.x86_64
2. 800 GB nvme disk formatted as LDISKFS
3. Lustre server v2.15.4
3. 1 lustre client with :
1. Lustre v2.15.6
2. Linux Kernel 5.14.0-503.11.1.el9_5.x86_64
4. default strip size is used :
1. stripe_count: 1 stripe_size: 1048576 pattern: 0 stripe_offset: -1
5. Interconnected using 56 Gbps Mellanox IB network
6. Contents of /etc/modprobe.d/lustre.conf file :
options lnet networks="o2ib(ib0)"
options lnet lnet_transaction_timeout=100
options lnet lnet_retry_count=2
options ko2iblnd peer_credits=32
options ko2iblnd peer_credits_hiw=16
options ko2iblnd concurrent_sends=256
options ksocklnd conns_per_peer=0
options ost oss_num_threads=64
I conducted individual tests on the OST nodes using obdfilter-survey. For
reference, the full summary output of the test is attached.
* nobjlo=1 nobjhi=512 thrlo=1 thrhi=1024 size=480000
rslt_loc=/var/tmp/obdfilter-survey_out targets="lustrefs-OST0001" case=disk
obdfilter-survey
ost 1 sz 491520000K rsz 1024K obj 16 thr 16 write 3377.62 [1428.92,
186681.32] rewrite 154516.51 [147831.54, 186675.48] read 6977.51 [3370.55,
103311.36]
ost 1 sz 491520000K rsz 1024K obj 16 thr 32 write 3661.83 [1510.83,
192783.49] rewrite 150708.13 [186337.79, 186337.79] read 6951.00 [2917.89,
59171.64]
ost 1 sz 491520000K rsz 1024K obj 16 thr 64 write 3603.10 [1545.90,
213008.56] rewrite 172656.48 [177891.67, 177891.67] read 6984.14 [3352.78,
57702.04]
ost 1 sz 491520000K rsz 1024K obj 16 thr 128 write 3692.16 [1594.80,
13478.11] rewrite 149716.18 [106440.28, 225295.61] read 6850.52 [2804.80,
45156.82]
ost 1 sz 491520000K rsz 1024K obj 16 thr 256 write 3661.13 [1446.88,
223403.23] rewrite 140771.55 [103769.40, 190108.76] read 6964.33 [3357.70,
85623.55]
ost 1 sz 491257856K rsz 1024K obj 16 thr 512 write 3193.67 [1001.90,
205874.24] rewrite 137435.34 [104790.09, 180991.34] read 6938.31 [3358.61,
54319.14]
ost 1 sz 490733568K rsz 1024K obj 16 thr 1024 write 2379.98 [ 454.94,
202684.59] rewrite 130579.85 [100158.02, 161904.29] read 6945.24 [3354.17,
48807.91]
* nobjlo=1 nobjhi=512 thrlo=1 thrhi=1024 size=480000
rslt_loc=/var/tmp/obdfilter-survey_out targets="lustrefs-OST0000" case=disk
obdfilter-survey
ost 1 sz 491520000K rsz 1024K obj 16 thr 16 write 3747.17 [1393.84,
190306.68] rewrite 156040.83 [148205.37, 188453.46] read 7009.94 [3398.61,
108528.27]
ost 1 sz 491520000K rsz 1024K obj 16 thr 32 write 3745.34 [1393.92,
193273.05] rewrite 154722.13 [177941.31, 177941.31] read 6989.40 [3330.82,
30959.14]
ost 1 sz 491520000K rsz 1024K obj 16 thr 64 write 3760.65 [1367.83,
104560.10] rewrite 162225.64 [148197.30, 148197.30] read 6999.92 [3363.80,
60847.55]
ost 1 sz 491520000K rsz 1024K obj 16 thr 128 write 3754.86 [1379.88,
56060.15] rewrite 147814.31 [104369.56, 217353.01] read 6990.76 [3330.77,
53634.79]
ost 1 sz 491520000K rsz 1024K obj 16 thr 256 write 3705.70 [1358.82,
150706.49] rewrite 138369.68 [101585.51, 182624.29] read 6962.05 [3337.70,
73858.34]
ost 1 sz 491257856K rsz 1024K obj 16 thr 512 write 3612.06 [1275.87,
95958.05] rewrite 134727.11 [105177.20, 172269.61] read 6986.99 [3350.63,
46219.95]
ost 1 sz 490733568K rsz 1024K obj 16 thr 1024 write 2867.46 [ 537.87,
53084.22] rewrite 129812.07 [102830.81, 159936.73] read 6987.93 [3335.35,
79355.00]
Network performance was evaluated across the cluster nodes using lnet_selftest,
yielding a bandwidth of approximately 6800 MB/s for both read and write
operations
I used IOR-4.0.0 to check the read and write bandwidth of the setup using the
following command. The output is attached for reference.
* mpirun -genvall -np 16 -ppn 16 -f /path_to_hostfile/hosts_rt05
/path_to_ior_bin/ior -F -w -r -e -g -C -w -b 1g -t 1m -i 4 -D 70 -vv -o ./out
Max Write: 1712.75 MiB/sec (1795.95 MB/sec)
Max Read: 83994.25 MiB/sec (88074.36 MB/sec)
* mpirun -genvall -np 16 -ppn 16 -f /path_to_hostfile/hosts_rt05
/path_to_ior_bin/ior -F -w -r -e -g -C -w -b 256m -t 1m -i 4 -D 70 -vv -o ./out
Max Write: 1633.31 MiB/sec (1712.65 MB/sec)
Max Read: 73826.50 MiB/sec (77412.69 MB/sec)
The observed write bandwidth of 1800 MB/s is significantly lower than the read
bandwidth of 88,000 MB/s. Are there specific configurations that could help
enhance write performance? Any suggestions or insights on addressing this
disparity would be greatly appreciated.
Thanks
John
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250416/d8e88681/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlx_uverbs_rt01_rt02_rt03_rt05_08-04-2025
Type: application/octet-stream
Size: 19150 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250416/d8e88681/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: obdfilter_survey_2025-04-09 at 08?28_rt03.summary
Type: application/octet-stream
Size: 10293 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250416/d8e88681/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: obdfilter_survey_2025-04-09 at 17?58_rt02.summary
Type: application/octet-stream
Size: 10303 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250416/d8e88681/attachment-0005.obj>
More information about the lustre-discuss
mailing list