[lustre-discuss] [EXTERNAL] issue: strange pauses between writes, but not everywhere
Mohr, Rick
mohrrf at ornl.gov
Wed Oct 29 07:09:23 PDT 2025
Peter,
Do you see any lustre timeouts or client evictions in your logs (either server or client) that correlate with these slowdowns?
--Rick
On 10/28/25, 4:13 PM, "lustre-discuss on behalf of Peter Grandi via lustre-discuss" <lustre-discuss-bounces at lists.lustre.org <mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of lustre-discuss at lists.Lustre.org <mailto:lustre-discuss at lists.Lustre.org>> wrote:
So I have 3 Lustre storage clusters which recently have develope a strange issue: * Each cluster has 1 MDT with 3 "enterprise" SSDs and 8 OSTs each with 3 "entreprise" SSDs, MDT and OSTs all done with ZFS, on top of Alma 8.10. Lustre version is 2.15.7. Each server is pretty overspecified (28 cores, 128GiB), 100Gb/s cards and switch, and the clients are the same as the servers except they run the client version of the Lustre 2.16.1 drivers. * For an example I will use the Lustre 'temp01' where the servers have addresses 192.168.102.40-48 where .40 is the MDT and some clients with addresses 192.168.102.13-36. * Reading is quite good for all clients. But since yesterday early afternoon inexplicably the clients .13-36 have a maximum average write speed of around 35-40MB/s; but if I mount 'temp01' on any of the Lustre servers (and I usually have it mounted on the MDT .40) write rates are as good as before. Mysteriously today for a while one of the clients (.14) wrote at previous good speeds for a while and then reverted to slow. I was tweaking the some '/proc/sys/net/ipv4/tcp_*' parameters at the time but the same parameters on .13 did not improve the situation. * I have collected 'tcpdump' traces on all the 'temp01' servers and a client while writing and examined with WireShark's "TCP Stream Graphs" (etc.) and what is happening is that the clients send at full speed for a little while and then pause for around 2-3 seconds and then resume. The servers when accessing 'temp01' as clients do not pauses. * If I use NFS Ganesha with NFSv4-over-TCP on the MDT exporting 'temp01' I can write to that at high rates (not as high as with native Lustre of course). * I have used 'iperf3' to check basic network rates and for "reasons" they are around 25-30Gb/s, but still much higher than observed *average* write speeds. * The issues persists after rebooting the clients (have not reebooted all the servers of at least one cluster, but I recently rebooted one of the MDTs). * I have checked the relevant switch logs and ports and there are no obvious errors or significant rates of packet issues. My current guesses are some issue with IP flow control or TCP window size but bare TCP with 'iperf3' and NFSv4-over-TCP both give good rates. So perhaps it is something weird with the LNET drivers with receive pacing in the Lustre driver. Please let me know if you have seen something similar or other suggestions
More information about the lustre-discuss
mailing list