[lustre-discuss] Dramatic loss of performance when another application does writing.

John Bauer bauerj at iodoctors.com
Mon Jan 12 13:48:51 PST 2026


All,

My questions of recent are related to my trying to understand the 
following issue.  I have an application that is writing, reading 
forwards, and reading backwards, a single file multiple times ( as seen 
in bottom frame of Image 1).  The file is striped 4x16M on 4 ssd OSTs on 
2 OSS.  Everything runs along just great with transfer rates in the 
5GB/s range.  At some point, another application triggers  approximately 
135 GB of writes to each of the 32 hdd OSTs on the16 OSSs of the file 
system.  When this happens my applications performance drops to 4.8 
MB/s, a 99.9% loss of performance for the 33+ second duration of the 
other application's  writes.  My application is doing 16MB preads and 
pwrites in parallel using 4 pthreads,  with O_DIRECT on the client.  The 
main question I have is: "Why do the writes from the other application 
affect my application so dramatically?" I am making demands of the 2 OSS 
of about the same order of magnitude, 2.5GB/s each from 2 OSS, as the 
other application is getting from the same 2 OSS, about 4 GB/s each.  
There should be no competition for the OSTs, as I am using ssd and the 
other application is using hdd.  If both applications are triggering 
Direct I/O on the OSSs, I would think there would be minimal competition 
for compute resources on the OSSs.  But as seen below in Image 3, there 
is a huge spike in cpu load during the other application's writes. This 
is not a one-off event.  I see this about 2 out of every 3 times I run 
this job.  I suspect the other application is one that checkpoints on a 
regular interval, but I am a non-root user and have no way to 
determine.  I am using PCP/pmapi to get the OSS data during my run.  If 
the images get removed from the email, I have used alternate text with 
links to Dropbox for the images.

Thanks,

John

Image 1:

https://www.dropbox.com/scl/fi/kih8qf6byl3bi5gc9r296/floorpan_oss_pause.png?rlkey=0o00o7x3oaw24h3cl3dyxyb2p&st=wahbm0gg&dl=0


Image2:

https://www.dropbox.com/scl/fi/e36jjoomqa3xkadcyhdw9/disk_V_RTC.png?rlkey=ujzx02n3us42ga9prsxm5dbkh&st=ato9s3gj&dl=0


Image 3:

https://www.dropbox.com/scl/fi/bzudgnwnecvkp3ra4kjvp/kernelAllLoad.png?rlkey=fni6lv4zwbt53aprg6twjmnsv&st=sy9expz6&dl=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260112/60b70fb3/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: floorpan_oss_pause.png
Type: image/png
Size: 94549 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260112/60b70fb3/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: disk_V_RTC.png
Type: image/png
Size: 152329 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260112/60b70fb3/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kernelAllLoad.png
Type: image/png
Size: 40935 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260112/60b70fb3/attachment-0005.png>


More information about the lustre-discuss mailing list