[lustre-discuss] wrong cache result
K._Scott Rowe
krowe at nrao.edu
Tue Apr 11 07:41:54 PDT 2023
We have a cluster of diskless nodes called nmpost running RHEL7.8, kernel
3.10.0-1127.13.1.el7.x86_64, Lustre version 2.12.5. Checking the md5sum
of a specific file on Lustre shows that most hosts get the correct result,
nmpost061 010585dfa7a66ae60b887a843056a4ec /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy
but a few get different results
nmpost073 e4c7c2eceec068ab061151866e2a0d64 /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy
nmpost077 61d1d2bc7a86b53334d005a72603d8a1 /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy
nmpost081 e4c7c2eceec068ab061151866e2a0d64 /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy
This casa_envoy file was updated about 10 days ago and it looks like
the hosts that see the wrong md5sum are seeing a previous version of
this file.
Either rebooting or running "echo 3 > /proc/sys/vm/drop_caches" on one
of these hosts causes it to see the correct md5sum
(010585dfa7a66ae60b887a843056a4ec). So it seems that the Linux page
cache is not getting updated with the new version of this file even after
running md5sum on it multiple times.
Any ideas on why this is? Is there a known issue between Lustre and
Linux page cache?
Thanks
More information about the lustre-discuss
mailing list