[lustre-discuss] Huge amounts of reads caused by shared library access
Laifer, Roland (SCC)
roland.laifer at kit.edu
Thu Sep 12 03:41:45 PDT 2024
Dear Lustre admins,
I wanted the share an issue which we see since about two years. Maybe
the issue also exists at your site or you can provide hints how the
issue can be alleviated.
The issue is that we have huge amounts of read operations on servers
which seem to be caused by shared libraries stored on Lustre. Apparently
the Lustre client cache does not work here as expected for many
different applications. Note that we have installed most software
packages on Lustre and if you don't do that you might not be affected.
Of course we have reported the issue to DDN support a long time ago.
They found an issue which might be causing it, see
https://jira.whamcloud.com/browse/LU-17463. But the patch is under
development since many months and I'm not sure if it will really fix it.
Some more details:
The affected system has nearly 1000 nodes, is used by more than 1000
active users and there are many small jobs which share the same node.
The Lustre version on clients and servers is 2.12.9 with patches from
DDN. The issue is currently causing multiple GB/s throughout and more
than 100 K IOPS on the affected file system.
With Lustre jobstats we saw that some jobs were creating hundreds of
millions read opertions. Other similar jobs did not have the issue, i.e.
the problem is not easily reproducible. We have a complicated reproducer
which works in most cases even on our test system.
Several users reported that they were only using software on the
affected file system. The command "lctl get_param
llite.<fs_name>*.stats" showed huge amounts of page_fault entries and
there were indeed many page faults for shared libraries stored on the
affected file system.
We also had discussions with another site where Lustre is provided from
another vendor and they are seeing the same issue.
Regards,
Roland
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5781 bytes
Desc: Kryptografische S/MIME-Signatur
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240912/be6bc8f8/attachment.bin>
More information about the lustre-discuss
mailing list