[lustre-discuss] Very low write bandwidth

Simon Guilbault simon.guilbault at calculquebec.ca
Wed Apr 16 13:48:28 PDT 2025


Hi,
You are measuring the read-cache of your client, your 56Gb/s IB connection
can only reach up to 7GB/s, your 88GB/s is bogus.

https://ior.readthedocs.io/en/latest/userDoc/tutorial.html#effect-of-page-cache-on-benchmarking

Drop your cache before the read phase or use multiple clients so your
client doesn't re-read what they have just written.



On Wed, Apr 16, 2025 at 2:41 AM evancervj via lustre-discuss <
lustre-discuss at lists.lustre.org> wrote:

> Hi,
>
> I have been working on benchmarking Lustre with IOR on a 4-node cluster
> and have encountered an issue, where the observed write bandwidth is
> significantly lower than read bandwidth. Below are the setup details for
> the cluster:
>
>    1. 1 MGS/MDS node with :
>       1. Linux Kernel 4.18.0-513.9.1.el8_lustre.x86_64
>       2. 800 GB nvme disk formatted as LDISKFS
>       3. Lustre server v2.15.4
>    2. 2 OSS nodes with 1 OST on each node with :
>       1. Linux Kernel 4.18.0-513.9.1.el8_lustre.x86_64
>       2. 800 GB nvme disk formatted as LDISKFS
>       3. Lustre server v2.15.4
>    3. 1 lustre client with :
>       1. Lustre v2.15.6
>       2. Linux Kernel 5.14.0-503.11.1.el9_5.x86_64
>    4. default strip size is used :
>       1. stripe_count:  1  stripe_size:   1048576  pattern:       0
>        stripe_offset: -1
>    5. Interconnected using 56 Gbps Mellanox IB network
>    6. Contents of /etc/modprobe.d/lustre.conf file :
>
>             options lnet networks="o2ib(ib0)"
>
>             options lnet lnet_transaction_timeout=100
>
>             options lnet lnet_retry_count=2
>
>             options ko2iblnd peer_credits=32
>
>             options ko2iblnd peer_credits_hiw=16
>
>             options ko2iblnd concurrent_sends=256
>
>             options ksocklnd conns_per_peer=0
>
>             options ost oss_num_threads=64
>
>
>
> I conducted individual tests on the OST nodes using obdfilter-survey. For
> reference, the full summary output of the test is attached.
>
>    -  nobjlo=1 nobjhi=512 thrlo=1 thrhi=1024 size=480000
>    rslt_loc=/var/tmp/obdfilter-survey_out targets="lustrefs-OST0001" case=disk
>    obdfilter-survey
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr   16 write 3377.62 [1428.92,
> 186681.32] rewrite 154516.51 [147831.54, 186675.48] read 6977.51 [3370.55,
> 103311.36]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr   32 write 3661.83 [1510.83,
> 192783.49] rewrite 150708.13 [186337.79, 186337.79] read 6951.00 [2917.89,
> 59171.64]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr   64 write 3603.10 [1545.90,
> 213008.56] rewrite 172656.48 [177891.67, 177891.67] read 6984.14 [3352.78,
> 57702.04]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr  128 write 3692.16 [1594.80,
> 13478.11] rewrite 149716.18 [106440.28, 225295.61] read 6850.52 [2804.80,
> 45156.82]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr  256 write 3661.13 [1446.88,
> 223403.23] rewrite 140771.55 [103769.40, 190108.76] read 6964.33 [3357.70,
> 85623.55]
>
> ost  1 sz 491257856K rsz 1024K obj   16 thr  512 write 3193.67 [1001.90,
> 205874.24] rewrite 137435.34 [104790.09, 180991.34] read 6938.31 [3358.61,
> 54319.14]
>
> ost  1 sz 490733568K rsz 1024K obj   16 thr 1024 write 2379.98 [ 454.94,
> 202684.59] rewrite 130579.85 [100158.02, 161904.29] read 6945.24 [3354.17,
> 48807.91]
>
>
>
>    -  nobjlo=1 nobjhi=512 thrlo=1 thrhi=1024 size=480000
>    rslt_loc=/var/tmp/obdfilter-survey_out targets="lustrefs-OST0000" case=disk
>    obdfilter-survey
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr   16 write 3747.17 [1393.84,
> 190306.68] rewrite 156040.83 [148205.37, 188453.46] read 7009.94 [3398.61,
> 108528.27]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr   32 write 3745.34 [1393.92,
> 193273.05] rewrite 154722.13 [177941.31, 177941.31] read 6989.40 [3330.82,
> 30959.14]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr   64 write 3760.65 [1367.83,
> 104560.10] rewrite 162225.64 [148197.30, 148197.30] read 6999.92 [3363.80,
> 60847.55]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr  128 write 3754.86 [1379.88,
> 56060.15] rewrite 147814.31 [104369.56, 217353.01] read 6990.76 [3330.77,
> 53634.79]
>
> ost  1 sz 491520000K rsz 1024K obj   16 thr  256 write 3705.70 [1358.82,
> 150706.49] rewrite 138369.68 [101585.51, 182624.29] read 6962.05 [3337.70,
> 73858.34]
>
> ost  1 sz 491257856K rsz 1024K obj   16 thr  512 write 3612.06 [1275.87,
> 95958.05] rewrite 134727.11 [105177.20, 172269.61] read 6986.99 [3350.63,
> 46219.95]
>
> ost  1 sz 490733568K rsz 1024K obj   16 thr 1024 write 2867.46 [ 537.87,
> 53084.22] rewrite 129812.07 [102830.81, 159936.73] read 6987.93 [3335.35,
> 79355.00]
>
>
>
> Network performance was evaluated across the cluster nodes using
> lnet_selftest, yielding a bandwidth of approximately 6800 MB/s for both
> read and write operations
>
> I used IOR-4.0.0 to check the read and write bandwidth of the setup using
> the following command. The output is attached for reference.
>
>    - mpirun -genvall -np 16 -ppn 16 -f /path_to_hostfile/hosts_rt05
>    /path_to_ior_bin/ior -F -w -r -e -g -C -w -b 1g -t 1m -i 4 -D 70 -vv -o
>    ./out
>
>             Max Write: 1712.75 MiB/sec (1795.95 MB/sec)
>
>             Max Read:  83994.25 MiB/sec (88074.36 MB/sec)
>
>    - mpirun -genvall -np 16 -ppn 16 -f /path_to_hostfile/hosts_rt05
>    /path_to_ior_bin/ior -F -w -r -e -g -C -w -b 256m -t 1m -i 4 -D 70 -vv -o
>    ./out
>
>             Max Write: 1633.31 MiB/sec (1712.65 MB/sec)
>
>             Max Read:  73826.50 MiB/sec (77412.69 MB/sec)
>
>
>
> The observed write bandwidth of 1800 MB/s is significantly lower than the
> read bandwidth of 88,000 MB/s. Are there specific configurations that could
> help enhance write performance? Any suggestions or insights on addressing
> this disparity would be greatly appreciated.
>
> Thanks
>
> John
>
> ------------------------------------------------------------------------------------------------------------
>
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------------------------------------------------------
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250416/4275f5df/attachment.htm>


More information about the lustre-discuss mailing list