[lustre-discuss] fio and lustre performance

Andreas Dilger adilger at whamcloud.com
Thu Aug 25 16:37:09 PDT 2022


No comment on the actual performance issue, but we normally test fio using the libaio interface (which is handled in the kernel) instead of posixaio (which is handled by threads in userspace, AFAIK), and also use DirectIO to avoid memory copies (OK if there are enough IO requests in flight).  it should be a relatively easy change to see if that improves the behaviour.

Other things to check - osc.*.max_dirty_mb and llite.*.max_cached_mb are not hitting limits and throttling IO until the data is flushed, and osc.*.max_rpcs_in_flight across the OSCs are *at least* 64 to keep up with the input generation.

When you consider that Lustre (distributed coherent persistent network filesystem) is "only half" of the performance (28s vs 13s) of a local ephemeral RAM-based filesystem, it isn't doing too badly...

Cheers, Andreas

On Aug 25, 2022, at 11:29, John Bauer <bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>> wrote:

Hi all,

I'm trying to figure out an odd behavior when running an fio ( https://git.kernel.dk/cgit/fio/ <https://git.kernel.dk/cgit/fio/> ) benchmark on a Lustre file system.

fio--randrepeat=1  \
   --ioengine=posixaio  \
   --buffered=1  \
   --gtod_reduce=1  \
   --name=test  \
   --filename=${fileName}  \
   --bs=1M  \
   --iodepth=64  \
   --size=40G  \
   --readwrite=randwrite

In short, the application queues 40,000 random aio_write64(nbyte=1M) to a maximum depth of 64, doing aio_suspend64 followed by aio_write to keep 64 outstanding aio requests.  My I/O library that processes the aio requests does so with 4 pthreads removing aio requests from the queue and doing the I/Os as pwrite64()s.  The odd behavior is the intermittent pauses that can been seen in the first plot below.  The X-axis is wall clock time, in seconds, and the left Y-axis is file postition. The horizontal blue lines indicate the amount of time each of the pwrite64 is active and where in the file the I/O is occurring. The right Y-axis is the cumulative cpu times for both the process and kernel during the run.  There is minimal user cpu time, for either the process or kernel.  The cumulative system cpu time attributable to the process ( the red line ) runs at a slope of ~4 system cpu seconds per wall clock second.  Makes sense since there are 4 pthreads at work in the user process.  The cumulative system cpu time for the kernel as a whole ( the green line ) is ~12 system cpu seconds per wall clock second.  Note that during the pauses the system cpu accumulation drops to near zero ( zero slope ).

This is run on a dedicated ivybridge node with 40 cores Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz

The node has 64G of memory.

The file is striped single component PFL, 8x1M.  Lustre version *2.12.8 ddn12*

Does anyone have any ideas what is causing the pauses?  Is there something else I could be looking at in the /proc file system to gain insight?

For comparison, the 2nd plot below is when run on /tmp.  Note that there are some pwrite64() that take a long time, but a single pwrite64() taking a long time does not stop all the other pwrite64() active during the same time period.  Elapsed time for /tmp is 13 seconds. Lustre is 28 seconds.  Both are essentially memory resident.

Just for completeness I have added a 3rd plot which is the amount of memory each of the OSC clients is consuming over the course of the Lustre run.  Nothing unusual there.  The memory consumption rate slows down during the pauses as one would expect.

I don't think the instrumentation is the issue, as there is not much more instrumentation occurring in the Lustre run versus /tmp, and they are both less than 6MB each in total.

John

In case the images got stripped here are some URLs to dropbox

plot1 : https://www.dropbox.com/s/ey217o053gdyse5/plot1.png?dl=0

plot2 : https://www.dropbox.com/s/vk23vmufa388l7h/plot2.png?dl=0

plot3 : https://www.dropbox.com/s/vk23vmufa388l7h/plot2.png?dl=0





_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220825/ce0656d6/attachment.htm>


More information about the lustre-discuss mailing list