[lustre-discuss] LDLM locks not expiring/cancelling

Thu Jan 2 11:24:31 PST 2020

Hi all,

We are running into a bizarre situation where we aren't having stale locks
cancel themselves, and even worse, it seems as if
ldlm.namespaces.*.lru_size is being ignored.

For instance, I unmount our Lustre file systems on a client machine, then
remount. Next, I'll run "lctl set_param ldlm.namespaces.*.lru_max_age=60s,
lctl set_param ldlm.namespaces.*.lru_size=1024". This (I believe)
theoretically would only allow 1024 ldlm locks per osc, and then I'd see a
lot of lock cancels (via ldlm.namespaces.${ost}.pool.stats). We also should
see cancels if the grant time > lru_max_age.

We can trigger this simply by running 'find' on the root of our Lustre file
system, and waiting for awhile. Eventually the clients SUnreclaim value
bloats to 60-70GB (!!!), and each of our OSTs have 30-40k LRU locks (via
lock_count). This is early in the process:

"""
ldlm.namespaces.h5-OST003f-osc-ffff8802d8559000.lock_count=2090
ldlm.namespaces.h5-OST0040-osc-ffff8802d8559000.lock_count=2127
ldlm.namespaces.h5-OST0047-osc-ffff8802d8559000.lock_count=52
ldlm.namespaces.h5-OST0048-osc-ffff8802d8559000.lock_count=1962
ldlm.namespaces.h5-OST0049-osc-ffff8802d8559000.lock_count=1247
ldlm.namespaces.h5-OST004a-osc-ffff8802d8559000.lock_count=1642
ldlm.namespaces.h5-OST004b-osc-ffff8802d8559000.lock_count=1340
ldlm.namespaces.h5-OST004c-osc-ffff8802d8559000.lock_count=1208
ldlm.namespaces.h5-OST004d-osc-ffff8802d8559000.lock_count=1422
ldlm.namespaces.h5-OST004e-osc-ffff8802d8559000.lock_count=1244
ldlm.namespaces.h5-OST004f-osc-ffff8802d8559000.lock_count=1117
ldlm.namespaces.h5-OST0050-osc-ffff8802d8559000.lock_count=1165
"""

But this will grow over time, and eventually this compute node gets evicted
from the MDS (after 10 minutes of cancelling locks/hanging). The only way
we have been able to reduce the slab usage is to drop caches and set
LRU=clear...but the problem just comes back depending on the workload.

We are running 2.10.3 client side, 2.10.1 server side. Have there been any
fixes added into the codebase for 2.10 that we need to apply? This seems to
be the closest to what we are experiencing:

https://jira.whamcloud.com/browse/LU-11518

PS: I've checked other systems across our cluster, and some of them have as
many as 50k locks per OST. I am kind of wondering if these locks are
staying around much longer than the lru_max_age default (65 minutes), but I
cannot prove that. Is there a good way to translate held locks to fids? I
have been messing around with lctl set_param debug="XXX" and lctl set_param
ldlm.namespaces.*.dump_namespace, but I don't feel like I'm getting *all*
of the locks.

~Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200102/09ca126f/attachment.html>