<div dir="ltr"><div dir="ltr">> so lustre_inode_cache is the real culprit when signal_cache appears to<br>> be large.<br>> This cache is slaved on the common inode cache, so there should be one<br>> entry for each lustre inode that is in memory.<br>> These inodes should get pruned when they've been inactive for a while.</div><div dir="ltr"><br></div><div>What triggers the prunning?</div><div><br></div><div>>If you look in /proc/sys/fs/inode-nr there should be two numbers:<br>> The first is the total number of in-memory inodes for all filesystems.<br>> The second is the number of "unused" inodes.<br>
><br>> When you write "3" to drop_caches, the second number should drop down to<br>
> nearly zero (I get 95 on my desktop, down from 6524).</div><div><br></div><div>Ok, that is useful to know but echoing 3 to drop_cache or generating memory pressure</div><div>clears most of the signal_cache (inode) as well as other lustre objects, so this is working fine.</div><div><br></div><div>The issue that remains is that they are marked as SUnreclaim vs SReclaimable. </div><div>So i do not think there is a memory leak per se. <br></div><div><br></div><div>Regards.</div><div>Jacek Tomaka<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 29, 2019 at 1:39 PM NeilBrown <<a href="mailto:neilb@suse.com">neilb@suse.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Thanks Jacek,<br>
so lustre_inode_cache is the real culprit when signal_cache appears to<br>
be large.<br>
This cache is slaved on the common inode cache, so there should be one<br>
entry for each lustre inode that is in memory.<br>
These inodes should get pruned when they've been inactive for a while.<br>
<br>
If you look in /proc/sys/fs/inode-nr there should be two numbers:<br>
The first is the total number of in-memory inodes for all filesystems.<br>
The second is the number of "unused" inodes.<br>
<br>
When you write "3" to drop_caches, the second number should drop down to<br>
nearly zero (I get 95 on my desktop, down from 6524).<br>
<br>
When signal_cache stays large even after the drop_caches, it suggest<br>
that there are lots of lustre inodes that are thought to be still<br>
active. I'd have to do a bit of digging to understand what that means,<br>
and a lot more to work out why lustre is holding on to inodes longer<br>
than you would expect (if that actually is the case).<br>
<br>
If an inode still has cached data pages attached that cannot easily be<br>
removed, it will not be purged even if it is unused.<br>
So if you see the "unused" number remaining high even after a<br>
"drop_caches", that might mean that lustre isn't letting go of cache<br>
pages for some reason.<br>
<br>
NeilBrown<br>
<br>
<br>
<br>
On Mon, Apr 29 2019, Jacek Tomaka wrote:<br>
<br>
> Wow, Thanks Nathan and NeilBrown.<br>
> It is great to learn about slub merging. It is awesome to have a<br>
> reproducer.<br>
> I am yet to trigger my original problem with slurm_nomerge but<br>
> slabinfo tool (in kernel sources) can actually show merged caches:<br>
> kernel/3.10.0-693.5.2.el7/tools/slabinfo -a<br>
><br>
> :t-0000112 <- sysfs_dir_cache kernfs_node_cache blkdev_integrity<br>
> task_delay_info<br>
> :t-0000144 <- flow_cache cl_env_kmem<br>
> :t-0000160 <- sigqueue lov_object_kmem<br>
> :t-0000168 <- lovsub_object_kmem osc_extent_kmem<br>
> :t-0000176 <- vvp_object_kmem nfsd4_stateids<br>
> :t-0000192 <- ldlm_resources kiocb cred_jar inet_peer_cache key_jar<br>
> file_lock_cache kmalloc-192 dmaengine-unmap-16 bio_integrity_payload<br>
> :t-0000216 <- vvp_session_kmem vm_area_struct<br>
> :t-0000256 <- biovec-16 ip_dst_cache bio-0 ll_file_data kmalloc-256<br>
> sgpool-8 filp request_sock_TCP rpc_tasks request_sock_TCPv6<br>
> skbuff_head_cache pool_workqueue lov_thread_kmem<br>
> :t-0000264 <- osc_lock_kmem numa_policy<br>
> :t-0000328 <- osc_session_kmem taskstats<br>
> :t-0000576 <- kioctx xfrm_dst_cache vvp_thread_kmem<br>
> :t-0001152 <- signal_cache lustre_inode_cache<br>
><br>
> It is not on a machine that had the problem i described before but the<br>
> kernel version is the same so I am assuming the cache merges are the same.<br>
><br>
> Looks like signal_cache points to lustre_inode_cache.<br>
> Regards.<br>
> Jacek Tomaka<br>
><br>
><br>
> On Thu, Apr 25, 2019 at 7:42 AM NeilBrown <<a href="mailto:neilb@suse.com" target="_blank">neilb@suse.com</a>> wrote:<br>
><br>
>><br>
>> Hi,<br>
>> you seem to be able to reproduce this fairly easily.<br>
>> If so, could you please boot with the "slub_nomerge" kernel parameter<br>
>> and then reproduce the (apparent) memory leak.<br>
>> I'm hoping that this will show some other slab that is actually using<br>
>> the memory - a slab with very similar object-size to signal_cache that<br>
>> is, by default, being merged with signal_cache.<br>
>><br>
>> Thanks,<br>
>> NeilBrown<br>
>><br>
>><br>
>> On Wed, Apr 24 2019, Nathan Dauchy - NOAA Affiliate wrote:<br>
>><br>
>> > On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka <<a href="mailto:jacekt@dug.com" target="_blank">jacekt@dug.com</a>> wrote:<br>
>> ><br>
>> >><br>
>> >> >signal_cache should have one entry for each process (or thread-group).<br>
>> >><br>
>> >> That is what i thought as well, looking at the kernel source,<br>
>> allocations<br>
>> >> from<br>
>> >> signal_cache happen only during fork.<br>
>> >><br>
>> >><br>
>> > I was recently chasing an issue with clients suffering from low memory<br>
>> and<br>
>> > saw that "signal_cache" was a major player. But the workload on those<br>
>> > clients was not doing a lot of forking. (and I don't *think* threading<br>
>> > either) Rather it was a LOT of metadata read operations.<br>
>> ><br>
>> > You can see the symptoms by a simple "du" on a Lustre file system:<br>
>> ><br>
>> > # grep signal_cache /proc/slabinfo<br>
>> > signal_cache 967 1092 1152 28 8 : tunables 0 0<br>
>> 0<br>
>> > : slabdata 39 39 0<br>
>> ><br>
>> > # du -s /mnt/lfs1/projects/foo<br>
>> > 339744908 /mnt/lfs1/projects/foo<br>
>> ><br>
>> > # grep signal_cache /proc/slabinfo<br>
>> > signal_cache 164724 164724 1152 28 8 : tunables 0 0<br>
>> 0<br>
>> > : slabdata 5883 5883 0<br>
>> ><br>
>> > # slabtop -s c -o | head -n 20<br>
>> > Active / Total Objects (% used) : 3660791 / 3662863 (99.9%)<br>
>> > Active / Total Slabs (% used) : 93019 / 93019 (100.0%)<br>
>> > Active / Total Caches (% used) : 72 / 107 (67.3%)<br>
>> > Active / Total Size (% used) : 836474.91K / 837502.16K (99.9%)<br>
>> > Minimum / Average / Maximum Object : 0.01K / 0.23K / 12.75K<br>
>> ><br>
>> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME<br>
>> ><br>
>> > 164724 164724 100% 1.12K 5883 28 188256K signal_cache<br>
>> ><br>
>> > 331712 331712 100% 0.50K 10366 32 165856K ldlm_locks<br>
>> ><br>
>> > 656896 656896 100% 0.12K 20528 32 82112K kmalloc-128<br>
>> ><br>
>> > 340200 339971 99% 0.19K 8100 42 64800K kmalloc-192<br>
>> ><br>
>> > 162838 162838 100% 0.30K 6263 26 50104K osc_object_kmem<br>
>> ><br>
>> > 744192 744192 100% 0.06K 11628 64 46512K kmalloc-64<br>
>> ><br>
>> > 205128 205128 100% 0.19K 4884 42 39072K dentry<br>
>> ><br>
>> > 4268 4256 99% 8.00K 1067 4 34144K kmalloc-8192<br>
>> ><br>
>> > 162978 162978 100% 0.17K 3543 46 28344K vvp_object_kmem<br>
>> ><br>
>> > 162792 162792 100% 0.16K 6783 24 27132K<br>
>> kvm_mmu_page_header<br>
>> ><br>
>> > 162825 162825 100% 0.16K 6513 25 26052K sigqueue<br>
>> ><br>
>> > 16368 16368 100% 1.02K 528 31 16896K nfs_inode_cache<br>
>> ><br>
>> > 20385 20385 100% 0.58K 755 27 12080K inode_cache<br>
>> ><br>
>> ><br>
>> > Repeat that for more (and bigger) directories and slab cache added up to<br>
>> > more than half the memory on this 24GB node.<br>
>> ><br>
>> > This is with CentOS-7.6 and lustre-2.10.5_ddn6.<br>
>> ><br>
>> > I worked around the problem by tackling the "ldlm_locks" memory usage<br>
>> with:<br>
>> > # lctl set_param ldlm.namespaces.lfs*.lru_max_age=10000<br>
>> ><br>
>> > ...but I did not find a way to reduce the "signal_cache".<br>
>> ><br>
>> > Regards,<br>
>> > Nathan<br>
>><br>
><br>
><br>
> -- <br>
> *Jacek Tomaka*<br>
> Geophysical Software Developer<br>
><br>
><br>
><br>
><br>
><br>
><br>
> *DownUnder GeoSolutions*<br>
> 76 Kings Park Road<br>
> West Perth 6005 WA, Australia<br>
> *tel *+61 8 9287 4143 <+61%208%209287%204143><br>
> <a href="mailto:jacekt@dug.com" target="_blank">jacekt@dug.com</a><br>
> *<a href="http://www.dug.com" rel="noreferrer" target="_blank">www.dug.com</a> <<a href="http://www.dug.com" rel="noreferrer" target="_blank">http://www.dug.com</a>>*<br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><font color="#000000"><font face="arial,helvetica,sans-serif"><b>Jacek Tomaka</b></font></font><br><font color="#000000"><font face="arial,helvetica,sans-serif"><font size="2">Geophysical Software Developer</font></font></font><br></div><font face="arial,helvetica,sans-serif" color="#000000">
</font><div><span lang="EN-US"></span> <span lang="EN-US"><b><br><br></b></span></div><font face="arial,helvetica,sans-serif" color="#000000">
</font><img src="http://drive.google.com/uc?export=view&id=0B4X9ixpc-ZU_NHV0WnluaXp5ZkE"><br><br><span style="color:rgb(102,102,102)"><font size="2"><b>DownUnder GeoSolutions<br><br></b></font></span><div><span style="color:rgb(102,102,102)"></span><span style="color:rgb(102,102,102)">76 Kings Park Road<br></span></div><span style="color:rgb(102,102,102)">West Perth 6005 WA, Australia<br><i><b>tel </b></i><a href="tel:+61%208%209287%204143" value="+61892874143" target="_blank">+61 8 9287 4143</a><br><a href="mailto:jacekt@dug.com" target="_blank">jacekt@dug.com</a><br><b><a href="http://www.dug.com" target="_blank">www.dug.com</a></b></span></div></div></div></div></div></div></span></div></div></div></div></div></div></div>