[lustre-discuss] Lustre client memory and MemoryAvailable

Wed Apr 24 14:56:07 PDT 2019

On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka <jacekt at dug.com> wrote:

>
> >signal_cache should have one entry for each process (or thread-group).
>
> That is what i thought as well, looking at the kernel source, allocations
> from
> signal_cache happen only during fork.
>
>
I was recently chasing an issue with clients suffering from low memory and
saw that "signal_cache" was a major player.  But the workload on those
clients was not doing a lot of forking.  (and I don't *think* threading
either)  Rather it was a LOT of metadata read operations.

You can see the symptoms by a simple "du" on a Lustre file system:

# grep signal_cache /proc/slabinfo
signal_cache         967   1092   1152   28    8 : tunables    0    0    0
: slabdata     39     39      0

# du -s /mnt/lfs1/projects/foo
339744908 /mnt/lfs1/projects/foo

# grep signal_cache /proc/slabinfo
signal_cache      164724 164724   1152   28    8 : tunables    0    0    0
: slabdata   5883   5883      0

# slabtop -s c -o | head -n 20
 Active / Total Objects (% used)    : 3660791 / 3662863 (99.9%)
 Active / Total Slabs (% used)      : 93019 / 93019 (100.0%)
 Active / Total Caches (% used)     : 72 / 107 (67.3%)
 Active / Total Size (% used)       : 836474.91K / 837502.16K (99.9%)
 Minimum / Average / Maximum Object : 0.01K / 0.23K / 12.75K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

164724 164724 100%    1.12K   5883       28    188256K signal_cache

331712 331712 100%    0.50K  10366       32    165856K ldlm_locks

656896 656896 100%    0.12K  20528       32     82112K kmalloc-128

340200 339971  99%    0.19K   8100       42     64800K kmalloc-192

162838 162838 100%    0.30K   6263       26     50104K osc_object_kmem

744192 744192 100%    0.06K  11628       64     46512K kmalloc-64

205128 205128 100%    0.19K   4884       42     39072K dentry

  4268   4256  99%    8.00K   1067        4     34144K kmalloc-8192

162978 162978 100%    0.17K   3543       46     28344K vvp_object_kmem

162792 162792 100%    0.16K   6783       24     27132K kvm_mmu_page_header

162825 162825 100%    0.16K   6513       25     26052K sigqueue

 16368  16368 100%    1.02K    528       31     16896K nfs_inode_cache

 20385  20385 100%    0.58K    755       27     12080K inode_cache

Repeat that for more (and bigger) directories and slab cache added up to
more than half the memory on this 24GB node.

This is with CentOS-7.6 and lustre-2.10.5_ddn6.

I worked around the problem by tackling the "ldlm_locks" memory usage with:
# lctl set_param ldlm.namespaces.lfs*.lru_max_age=10000

...but I did not find a way to reduce the "signal_cache".

Regards,
Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190424/6ebc8aeb/attachment.html>