<div dir="ltr"><div dir="ltr"><div>Wow, Thanks Nathan and NeilBrown. <br></div><div>It is great to learn about slub merging. It is awesome to have a reproducer. <br></div><div>I am yet to trigger my original problem with slurm_nomerge but<br></div><div>slabinfo tool (in kernel sources) can actually show merged caches: <br></div><div>kernel/3.10.0-693.5.2.el7/tools/slabinfo  -a<br><br>:t-0000112   <- sysfs_dir_cache kernfs_node_cache blkdev_integrity task_delay_info<br>:t-0000144   <- flow_cache cl_env_kmem<br>:t-0000160   <- sigqueue lov_object_kmem<br>:t-0000168   <- lovsub_object_kmem osc_extent_kmem<br>:t-0000176   <- vvp_object_kmem nfsd4_stateids<br>:t-0000192   <- ldlm_resources kiocb cred_jar inet_peer_cache key_jar file_lock_cache kmalloc-192 dmaengine-unmap-16 bio_integrity_payload<br>:t-0000216   <- vvp_session_kmem vm_area_struct<br>:t-0000256   <- biovec-16 ip_dst_cache bio-0 ll_file_data kmalloc-256 sgpool-8 filp request_sock_TCP rpc_tasks request_sock_TCPv6 skbuff_head_cache pool_workqueue lov_thread_kmem<br>:t-0000264   <- osc_lock_kmem numa_policy<br>:t-0000328   <- osc_session_kmem taskstats<br>:t-0000576   <- kioctx xfrm_dst_cache vvp_thread_kmem<br>:t-0001152   <- signal_cache lustre_inode_cache<br><br></div><div>It is not on a machine that had the problem i described before but the kernel version is the same so I am assuming the cache merges are the same. <br></div><div><br></div><div>Looks like signal_cache points to lustre_inode_cache. <br></div><div>Regards.</div><div>Jacek Tomaka<br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 25, 2019 at 7:42 AM NeilBrown <<a href="mailto:neilb@suse.com">neilb@suse.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

Hi,<br>

 you seem to be able to reproduce this fairly easily.<br>

 If so, could you please boot with the "slub_nomerge" kernel parameter<br>

 and then reproduce the (apparent) memory leak.<br>

 I'm hoping that this will show some other slab that is actually using<br>

 the memory - a slab with very similar object-size to signal_cache that<br>

 is, by default, being merged with signal_cache.<br>

<br>

Thanks,<br>

NeilBrown<br>

<br>

<br>

On Wed, Apr 24 2019, Nathan Dauchy - NOAA Affiliate wrote:<br>

<br>

> On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka <<a href="mailto:jacekt@dug.com" target="_blank">jacekt@dug.com</a>> wrote:<br>

><br>

>><br>

>> >signal_cache should have one entry for each process (or thread-group).<br>

>><br>

>> That is what i thought as well, looking at the kernel source, allocations<br>

>> from<br>

>> signal_cache happen only during fork.<br>

>><br>

>><br>

> I was recently chasing an issue with clients suffering from low memory and<br>

> saw that "signal_cache" was a major player.  But the workload on those<br>

> clients was not doing a lot of forking.  (and I don't *think* threading<br>

> either)  Rather it was a LOT of metadata read operations.<br>

><br>

> You can see the symptoms by a simple "du" on a Lustre file system:<br>

><br>

> # grep signal_cache /proc/slabinfo<br>

> signal_cache         967   1092   1152   28    8 : tunables    0    0    0<br>

> : slabdata     39     39      0<br>

><br>

> # du -s /mnt/lfs1/projects/foo<br>

> 339744908 /mnt/lfs1/projects/foo<br>

><br>

> # grep signal_cache /proc/slabinfo<br>

> signal_cache      164724 164724   1152   28    8 : tunables    0    0    0<br>

> : slabdata   5883   5883      0<br>

><br>

> # slabtop -s c -o | head -n 20<br>

>  Active / Total Objects (% used)    : 3660791 / 3662863 (99.9%)<br>

>  Active / Total Slabs (% used)      : 93019 / 93019 (100.0%)<br>

>  Active / Total Caches (% used)     : 72 / 107 (67.3%)<br>

>  Active / Total Size (% used)       : 836474.91K / 837502.16K (99.9%)<br>

>  Minimum / Average / Maximum Object : 0.01K / 0.23K / 12.75K<br>

><br>

>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME<br>

><br>

> 164724 164724 100%    1.12K   5883       28    188256K signal_cache<br>

><br>

> 331712 331712 100%    0.50K  10366       32    165856K ldlm_locks<br>

><br>

> 656896 656896 100%    0.12K  20528       32     82112K kmalloc-128<br>

><br>

> 340200 339971  99%    0.19K   8100       42     64800K kmalloc-192<br>

><br>

> 162838 162838 100%    0.30K   6263       26     50104K osc_object_kmem<br>

><br>

> 744192 744192 100%    0.06K  11628       64     46512K kmalloc-64<br>

><br>

> 205128 205128 100%    0.19K   4884       42     39072K dentry<br>

><br>

>   4268   4256  99%    8.00K   1067        4     34144K kmalloc-8192<br>

><br>

> 162978 162978 100%    0.17K   3543       46     28344K vvp_object_kmem<br>

><br>

> 162792 162792 100%    0.16K   6783       24     27132K kvm_mmu_page_header<br>

><br>

> 162825 162825 100%    0.16K   6513       25     26052K sigqueue<br>

><br>

>  16368  16368 100%    1.02K    528       31     16896K nfs_inode_cache<br>

><br>

>  20385  20385 100%    0.58K    755       27     12080K inode_cache<br>

><br>

><br>

> Repeat that for more (and bigger) directories and slab cache added up to<br>

> more than half the memory on this 24GB node.<br>

><br>

> This is with CentOS-7.6 and lustre-2.10.5_ddn6.<br>

><br>

> I worked around the problem by tackling the "ldlm_locks" memory usage with:<br>

> # lctl set_param ldlm.namespaces.lfs*.lru_max_age=10000<br>

><br>

> ...but I did not find a way to reduce the "signal_cache".<br>

><br>

> Regards,<br>

> Nathan<br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><font color="#000000"><font face="arial,helvetica,sans-serif"><b>Jacek Tomaka</b></font></font><br><font color="#000000"><font face="arial,helvetica,sans-serif"><font size="2">Geophysical Software Developer</font></font></font><br></div><font face="arial,helvetica,sans-serif" color="#000000">

</font><div><span lang="EN-US"></span> <span lang="EN-US"><b><br><br></b></span></div><font face="arial,helvetica,sans-serif" color="#000000">

</font><img src="http://drive.google.com/uc?export=view&id=0B4X9ixpc-ZU_NHV0WnluaXp5ZkE"><br><br><span style="color:rgb(102,102,102)"><font size="2"><b>DownUnder GeoSolutions<br><br></b></font></span><div><span style="color:rgb(102,102,102)"></span><span style="color:rgb(102,102,102)">76 Kings Park Road<br></span></div><span style="color:rgb(102,102,102)">West Perth 6005 WA, Australia<br><i><b>tel </b></i><a href="tel:+61%208%209287%204143" value="+61892874143" target="_blank">+61 8 9287 4143</a><br><a href="mailto:jacekt@dug.com" target="_blank">jacekt@dug.com</a><br><b><a href="http://www.dug.com" target="_blank">www.dug.com</a></b></span></div></div></div></div></div></div></span></div></div></div></div></div></div>