[lustre-discuss] Lustre caching behavior
Andreas Dilger
adilger at thelustrecollective.com
Wed Mar 25 02:16:33 PDT 2026
On Mar 24, 2026, at 17:56, Oleg Drokin <green at whamcloud.com> wrote:
> On Tue, 2026-03-24 at 21:08, Andreas Dilger <adilger at thelustrecollective.com <mailto:adilger at thelustrecollective.com>> wrote:
>> whether it would
>> be possible to downgrade the DLM write locks to read locks while
>> preserving the cached data on the client(s).
>
> I think we long wanted this functionality but never implemented it.
I know Mike implemented DLM lock downgrade for IBITs locks, and AFAIK
that is in use, but I don't know if there is any implementation for
doing extent lock downgrades? I couldn't find anything in Jira (either
open or closed).
> On Tue, 2026-03-24 at 23:42 +0000, Patrick Farrell wrote:
>>
>>
>> Are you able to pin this down into more of a reproducer? Even just a
>> more granular description.
>>
>> I’m curious to explore it - this is poor behavior, not desirable for
>> sure. I’m curious in particular to see about the lock cancellation -
>> my understanding had been the glimpse request to read lock path was
>> entirely opportunistic (NONBLOCKING in ldlm speak) - and would never
>> cause a cancel (ie, my understanding doesn’t accord with Andreas’s).
>> I was pretty sure about that.
>
> We definitely consider dropping unused the lock after glimpse:
> /**
> * Callback handler for receiving incoming glimpse ASTs.
> *
> * This only can happen on client side. After handling the glimpse AST
> * we also consider dropping the lock here if it is unused locally for
> a
> * long time.
> */
> static void ldlm_handle_gl_callback(struct ptlrpc_request *req,
> struct ldlm_namespace *ns,
> struct ldlm_request *dlm_req,
> struct ldlm_lock *lock)
>
>
> ...
> if (lock->l_granted_mode == LCK_PW &&
> !lock->l_readers && !lock->l_writers &&
> ktime_after(ktime_get(),
> ktime_add(lock->l_last_used, ns->ns_dirty_age_limit))) {
> ...
> if (ldlm_bl_to_thread_lock(ns, ld, lock))
> ldlm_handle_bl_callback(ns, ld, lock);
>
> And ns_dirty_age_limit is 10 seconds.
So it does appear that the code will only consider cancelling a write DLM
lock and not a read lock, which is good. Implementing extent lock downgrade
would at least avoid canceling the (otherwise idle) client cache for "ls",
but it would not totally eliminate DLM lock cancellation in the background.
It looks like there is a build-time "LDLM_DIRTY_AGE_LIMIT (10)" constant
to initialize "ns->ns_dirty_age_limit", but this is not exposed as a runtime
tunable parameter as is the norm today.
John, if you can rebuild your clients you could change this to something longer
(e.g. 60s, or make it partially tunable with "max(10, ns->ns_lru_max_age/30)"
without going through developing a new parameter, though this could also be
cloned in a straightforward manner from lru_max_age_{show,store}().
Increasing this too much could negatively impact files that are accessed by
many clients after the initial write.
Cheers, Andreas
More information about the lustre-discuss
mailing list