[lustre-discuss] lustre-discuss Digest, Vol 235, Issue 29 strange pauses between writes, but, not everywhere
John Bauer
bauerj at iodoctors.com
Tue Oct 28 15:08:12 PDT 2025
Peter,
This looks suspiciously similar to an issue you and I discussed a couple
of years ago, titled "Lustre caching and NUMA nodes" December 6,2023.
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2023-December/018956.html
The application program, in my case dd, is simply doing a streaming
write, 8000 @ 1MB, reading from /dev/zero. For reasons that escape me,
the cached memory of all OSCs, for all file systems, intermittently gets
dropped, causing pauses is the dd write. The other non-Lustre cached
data does not get dropped. An image that depicts this is at:
https://www.dropbox.com/scl/fi/augs88r7lcdfd6wb7nwrf/pfe27_allOSC_cached.png?rlkey=ynaw60yknwmfjavfy5gxsuk76&e=1&dl=0
I would be curious to see what behavior the OSC caching is exhibiting on
your compute node when you experience these write pauses.
John
On 10/28/2025 3:03 PM, lustre-discuss-request at lists.lustre.org wrote:
> Send lustre-discuss mailing list submissions to
> lustre-discuss at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-request at lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
> 1. issue: strange pauses between writes, but not everywhere
> (Peter Grandi)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 28 Oct 2025 20:01:41 +0000
> From:pg at lustre.list.sabi.co.UK (Peter Grandi)
> To: list Linux fs Lustre<lustre-discuss at lists.Lustre.org>
> Subject: [lustre-discuss] issue: strange pauses between writes, but
> not everywhere
> Message-ID:<yf3ms5ahk2i.fsf at petal.ty.sabi.co.uk>
> Content-Type: text/plain
>
> So I have 3 Lustre storage clusters which recently have develope a
> strange issue:
>
> * Each cluster has 1 MDT with 3 "enterprise" SSDs and 8 OSTs each with 3
> "entreprise" SSDs, MDT and OSTs all done with ZFS, on top of Alma
> 8.10. Lustre version is 2.15.7. Each server is pretty overspecified
> (28 cores, 128GiB), 100Gb/s cards and switch, and the clients are the
> same as the servers except they run the client version of the Lustre
> 2.16.1 drivers.
>
> * For an example I will use the Lustre 'temp01' where the servers have
> addresses 192.168.102.40-48 where .40 is the MDT and some clients with
> addresses 192.168.102.13-36.
>
> * Reading is quite good for all clients. But since yesterday early
> afternoon inexplicably the clients .13-36 have a maximum average write
> speed of around 35-40MB/s; but if I mount 'temp01' on any of the
> Lustre servers (and I usually have it mounted on the MDT .40) write
> rates are as good as before. Mysteriously today for a while one of the
> clients (.14) wrote at previous good speeds for a while and then
> reverted to slow. I was tweaking the some '/proc/sys/net/ipv4/tcp_*'
> parameters at the time but the same parameters on .13 did not improve
> the situation.
>
> * I have collected 'tcpdump' traces on all the 'temp01' servers and a
> client while writing and examined with WireShark's "TCP Stream Graphs"
> (etc.) and what is happening is that the clients send at full speed
> for a little while and then pause for around 2-3 seconds and then
> resume. The servers when accessing 'temp01' as clients do not pauses.
>
> * If I use NFS Ganesha with NFSv4-over-TCP on the MDT exporting 'temp01'
> I can write to that at high rates (not as high as with native Lustre
> of course).
>
> * I have used 'iperf3' to check basic network rates and for "reasons"
> they are around 25-30Gb/s, but still much higher than observed
> *average* write speeds.
>
> * The issues persists after rebooting the clients (have not reebooted
> all the servers of at least one cluster, but I recently rebooted
> one of the MDTs).
>
> * I have checked the relevant switch logs and ports and there are no
> obvious errors or significant rates of packet issues.
>
> My current guesses are some issue with IP flow control or TCP window
> size but bare TCP with 'iperf3' and NFSv4-over-TCP both give good rates.
> So perhaps it is something weird with the LNET drivers with receive
> pacing in the Lustre driver.
>
> Please let me know if you have seen something similar or other
> suggestions.
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ------------------------------
>
> End of lustre-discuss Digest, Vol 235, Issue 29
> ***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20251028/bd00daec/attachment.htm>
More information about the lustre-discuss
mailing list