[lustre-discuss] Reasons why LLITE statistics would not capture activity
Andreas Dilger
adilger at ddn.com
Thu Nov 21 12:27:01 PST 2024
Is it possible these applications are using mmap() to do the IO? I'm not sure if mmap is (or can) be effectively tracked at the user/kernel interface (which is what llite stats are showing).
You _might_ be able to see the page faults in the "vmstat 1" output?
I'm of course happy to be proven wrong by adding stats counters for this (eg. count page faults, etc).
Cheers, Andreas
On Nov 19, 2024, at 07:55, Martin, Philipp <pm.martin at itc.rwth-aachen.de> wrote:
You don't often get email from pm.martin at itc.rwth-aachen.de. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi all,
We are having an issue where the statistics file in `.../lustre/llite/*/stats` does not show the read or write bytes for some traffic.
File opens & closes are being recorded and the read/write activity is shown in the `osc/*/stats` files as expected, but it would be more convenient to see the aggregated results rather than having to sum up the data for every storage target.
What would cause traffic to not be shown under llite? Can certain I/O bypass the LLITE subsystem?
For reference, we have noticed this specifically for machine learning tools using PyTorch and nVidia DALI.
I'd be grateful for any hints!
Philipp
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20241121/981cb494/attachment-0001.htm>
More information about the lustre-discuss
mailing list