[lustre-discuss] Reasons why LLITE statistics would not capture activity

Andreas Dilger adilger at ddn.com
Thu Nov 21 12:27:01 PST 2024


Is it possible these applications are using mmap() to do the IO?  I'm not sure if mmap is (or can) be effectively tracked at the user/kernel interface (which is what llite stats are showing).

You _might_ be able to see the page faults in the "vmstat 1" output?

I'm of course happy to be proven wrong by adding stats counters for this (eg. count page faults, etc).

Cheers, Andreas

On Nov 19, 2024, at 07:55, Martin, Philipp <pm.martin at itc.rwth-aachen.de> wrote:


You don't often get email from pm.martin at itc.rwth-aachen.de. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>

Hi all,


We are having an issue where the statistics file in `.../lustre/llite/*/stats` does not show the read or write bytes for some traffic.

File opens & closes are being recorded and the read/write activity is shown in the `osc/*/stats` files as expected, but it would be more convenient to see the aggregated results rather than having to sum up the data for every storage target.


What would cause traffic to not be shown under llite? Can certain I/O bypass the LLITE subsystem?

For reference, we have noticed this specifically for machine learning tools using PyTorch and nVidia DALI.


I'd be grateful for any hints!

Philipp

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20241121/981cb494/attachment-0001.htm>


More information about the lustre-discuss mailing list