[Lustre-discuss] ldlm_locks memory usage crashes OSS

Wed Sep 26 10:07:28 PDT 2012

On Sep 26, 2012, at 11:34 AM, Jérémie Dubois-Lacoste wrote:

> We have typically ~80 clients running.
> In our case the locks were eating memory on the oss, not the mds, but
> yes it might
> still have the same cause.

Hmmm.  With only 80 clients I wouldn't expect that locks would use up too much memory.  But the number of cached locks is controlled on a per target basis, so if the OSS node has quite a few OSTs, the aggregate number of locks could get large.

> I'm not sure how to find the lifespan and number of locks per node. I
> think we're using the default settings. Do you know how to check this?

Take a look at /proc/fs/lustre/ldlm/namespaces/*/{lru_max_age,lru_size}. The lru_max_age file shows the amount of time a lock will be cached before it ages out.  The lru_size file shows the max number of locks that will be cached.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu