[lustre-discuss] lustre-discuss Digest, Vol 235, Issue 29 strange pauses between writes, but, not everywhere

Tue Oct 28 15:08:12 PDT 2025

Peter,

This looks suspiciously similar to an issue you and I discussed a couple 
of years ago, titled "Lustre caching and NUMA nodes" December 6,2023.

http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2023-December/018956.html

The application program, in my case dd, is simply doing a streaming 
write, 8000 @ 1MB, reading from /dev/zero.  For reasons that escape me, 
the cached memory of all OSCs, for all file systems, intermittently gets 
dropped, causing pauses is the dd write.  The other  non-Lustre cached 
data does not get dropped. An image that depicts this is at:

https://www.dropbox.com/scl/fi/augs88r7lcdfd6wb7nwrf/pfe27_allOSC_cached.png?rlkey=ynaw60yknwmfjavfy5gxsuk76&e=1&dl=0

I would be curious to see what behavior the OSC caching is exhibiting on 
your compute node when you experience these write pauses.

John

On 10/28/2025 3:03 PM, lustre-discuss-request at lists.lustre.org wrote:
> Send lustre-discuss mailing list submissions to
> 	lustre-discuss at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> 	lustre-discuss-request at lists.lustre.org
>
> You can reach the person managing the list at
> 	lustre-discuss-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>     1. issue: strange pauses between writes, but not everywhere
>        (Peter Grandi)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 28 Oct 2025 20:01:41 +0000
> From:pg at lustre.list.sabi.co.UK (Peter Grandi)
> To: list Linux fs Lustre<lustre-discuss at lists.Lustre.org>
> Subject: [lustre-discuss] issue: strange pauses between writes, but
> 	not everywhere
> Message-ID:<yf3ms5ahk2i.fsf at petal.ty.sabi.co.uk>
> Content-Type: text/plain
>
> So I have 3 Lustre storage clusters which recently have develope a
> strange issue:
>
> * Each cluster has 1 MDT with 3 "enterprise" SSDs and 8 OSTs each with 3
>    "entreprise" SSDs, MDT and OSTs all done with ZFS, on top of Alma
>    8.10. Lustre version is 2.15.7. Each server is pretty overspecified
>    (28 cores, 128GiB), 100Gb/s cards and switch, and the clients are the
>    same as the servers except they run the client version of the Lustre
>    2.16.1 drivers.
>
> * For an example I will use the Lustre 'temp01' where the servers have
>    addresses 192.168.102.40-48 where .40 is the MDT and some clients with
>    addresses 192.168.102.13-36.
>
> * Reading is quite good for all clients. But since yesterday early
>    afternoon inexplicably the clients .13-36 have a maximum average write
>    speed of around 35-40MB/s; but if I mount 'temp01' on any of the
>    Lustre servers (and I usually have it mounted on the MDT .40) write
>    rates are as good as before. Mysteriously today for a while one of the
>    clients (.14) wrote at previous good speeds for a while and then
>    reverted to slow. I was tweaking the some '/proc/sys/net/ipv4/tcp_*'
>    parameters at the time but the same parameters on .13 did not improve
>    the situation.
>
> * I have collected 'tcpdump' traces on all the 'temp01' servers and a
>    client while writing and examined with WireShark's "TCP Stream Graphs"
>    (etc.) and what is happening is that the clients send at full speed
>    for a little while and then pause for around 2-3 seconds and then
>    resume. The servers when accessing 'temp01' as clients do not pauses.
>
> * If I use NFS Ganesha with NFSv4-over-TCP on the MDT exporting 'temp01'
>    I can write to that at high rates (not as high as with native Lustre
>    of course).
>
> * I have used 'iperf3' to check basic network rates and for "reasons"
>    they are around 25-30Gb/s, but still much higher than observed
>    *average* write speeds.
>
> * The issues persists after rebooting the clients (have not reebooted
>    all the servers of at least one cluster, but I recently rebooted
>    one of the MDTs).
>
> * I have checked the relevant switch logs and ports and there are no
>    obvious errors or significant rates of packet issues.
>
> My current guesses are some issue with IP flow control or TCP window
> size but bare TCP with 'iperf3' and NFSv4-over-TCP both give good rates.
> So perhaps it is something weird with the LNET drivers with receive
> pacing in the Lustre driver.
>
> Please let me know if you have seen something similar or other
> suggestions.
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ------------------------------
>
> End of lustre-discuss Digest, Vol 235, Issue 29
> ***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20251028/bd00daec/attachment.htm>