[lustre-discuss] Lustre client memory and MemoryAvailable
Jacek Tomaka
jacekt at dug.com
Mon Apr 29 01:27:50 PDT 2019
> so lustre_inode_cache is the real culprit when signal_cache appears to
> be large.
> This cache is slaved on the common inode cache, so there should be one
> entry for each lustre inode that is in memory.
> These inodes should get pruned when they've been inactive for a while.
What triggers the prunning?
>If you look in /proc/sys/fs/inode-nr there should be two numbers:
> The first is the total number of in-memory inodes for all filesystems.
> The second is the number of "unused" inodes.
>
> When you write "3" to drop_caches, the second number should drop down to
> nearly zero (I get 95 on my desktop, down from 6524).
Ok, that is useful to know but echoing 3 to drop_cache or generating memory
pressure
clears most of the signal_cache (inode) as well as other lustre objects, so
this is working fine.
The issue that remains is that they are marked as SUnreclaim vs
SReclaimable.
So i do not think there is a memory leak per se.
Regards.
Jacek Tomaka
On Mon, Apr 29, 2019 at 1:39 PM NeilBrown <neilb at suse.com> wrote:
>
> Thanks Jacek,
> so lustre_inode_cache is the real culprit when signal_cache appears to
> be large.
> This cache is slaved on the common inode cache, so there should be one
> entry for each lustre inode that is in memory.
> These inodes should get pruned when they've been inactive for a while.
>
> If you look in /proc/sys/fs/inode-nr there should be two numbers:
> The first is the total number of in-memory inodes for all filesystems.
> The second is the number of "unused" inodes.
>
> When you write "3" to drop_caches, the second number should drop down to
> nearly zero (I get 95 on my desktop, down from 6524).
>
> When signal_cache stays large even after the drop_caches, it suggest
> that there are lots of lustre inodes that are thought to be still
> active. I'd have to do a bit of digging to understand what that means,
> and a lot more to work out why lustre is holding on to inodes longer
> than you would expect (if that actually is the case).
>
> If an inode still has cached data pages attached that cannot easily be
> removed, it will not be purged even if it is unused.
> So if you see the "unused" number remaining high even after a
> "drop_caches", that might mean that lustre isn't letting go of cache
> pages for some reason.
>
> NeilBrown
>
>
>
> On Mon, Apr 29 2019, Jacek Tomaka wrote:
>
> > Wow, Thanks Nathan and NeilBrown.
> > It is great to learn about slub merging. It is awesome to have a
> > reproducer.
> > I am yet to trigger my original problem with slurm_nomerge but
> > slabinfo tool (in kernel sources) can actually show merged caches:
> > kernel/3.10.0-693.5.2.el7/tools/slabinfo -a
> >
> > :t-0000112 <- sysfs_dir_cache kernfs_node_cache blkdev_integrity
> > task_delay_info
> > :t-0000144 <- flow_cache cl_env_kmem
> > :t-0000160 <- sigqueue lov_object_kmem
> > :t-0000168 <- lovsub_object_kmem osc_extent_kmem
> > :t-0000176 <- vvp_object_kmem nfsd4_stateids
> > :t-0000192 <- ldlm_resources kiocb cred_jar inet_peer_cache key_jar
> > file_lock_cache kmalloc-192 dmaengine-unmap-16 bio_integrity_payload
> > :t-0000216 <- vvp_session_kmem vm_area_struct
> > :t-0000256 <- biovec-16 ip_dst_cache bio-0 ll_file_data kmalloc-256
> > sgpool-8 filp request_sock_TCP rpc_tasks request_sock_TCPv6
> > skbuff_head_cache pool_workqueue lov_thread_kmem
> > :t-0000264 <- osc_lock_kmem numa_policy
> > :t-0000328 <- osc_session_kmem taskstats
> > :t-0000576 <- kioctx xfrm_dst_cache vvp_thread_kmem
> > :t-0001152 <- signal_cache lustre_inode_cache
> >
> > It is not on a machine that had the problem i described before but the
> > kernel version is the same so I am assuming the cache merges are the
> same.
> >
> > Looks like signal_cache points to lustre_inode_cache.
> > Regards.
> > Jacek Tomaka
> >
> >
> > On Thu, Apr 25, 2019 at 7:42 AM NeilBrown <neilb at suse.com> wrote:
> >
> >>
> >> Hi,
> >> you seem to be able to reproduce this fairly easily.
> >> If so, could you please boot with the "slub_nomerge" kernel parameter
> >> and then reproduce the (apparent) memory leak.
> >> I'm hoping that this will show some other slab that is actually using
> >> the memory - a slab with very similar object-size to signal_cache that
> >> is, by default, being merged with signal_cache.
> >>
> >> Thanks,
> >> NeilBrown
> >>
> >>
> >> On Wed, Apr 24 2019, Nathan Dauchy - NOAA Affiliate wrote:
> >>
> >> > On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka <jacekt at dug.com> wrote:
> >> >
> >> >>
> >> >> >signal_cache should have one entry for each process (or
> thread-group).
> >> >>
> >> >> That is what i thought as well, looking at the kernel source,
> >> allocations
> >> >> from
> >> >> signal_cache happen only during fork.
> >> >>
> >> >>
> >> > I was recently chasing an issue with clients suffering from low memory
> >> and
> >> > saw that "signal_cache" was a major player. But the workload on those
> >> > clients was not doing a lot of forking. (and I don't *think*
> threading
> >> > either) Rather it was a LOT of metadata read operations.
> >> >
> >> > You can see the symptoms by a simple "du" on a Lustre file system:
> >> >
> >> > # grep signal_cache /proc/slabinfo
> >> > signal_cache 967 1092 1152 28 8 : tunables 0 0
> >> 0
> >> > : slabdata 39 39 0
> >> >
> >> > # du -s /mnt/lfs1/projects/foo
> >> > 339744908 /mnt/lfs1/projects/foo
> >> >
> >> > # grep signal_cache /proc/slabinfo
> >> > signal_cache 164724 164724 1152 28 8 : tunables 0 0
> >> 0
> >> > : slabdata 5883 5883 0
> >> >
> >> > # slabtop -s c -o | head -n 20
> >> > Active / Total Objects (% used) : 3660791 / 3662863 (99.9%)
> >> > Active / Total Slabs (% used) : 93019 / 93019 (100.0%)
> >> > Active / Total Caches (% used) : 72 / 107 (67.3%)
> >> > Active / Total Size (% used) : 836474.91K / 837502.16K (99.9%)
> >> > Minimum / Average / Maximum Object : 0.01K / 0.23K / 12.75K
> >> >
> >> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >> >
> >> > 164724 164724 100% 1.12K 5883 28 188256K signal_cache
> >> >
> >> > 331712 331712 100% 0.50K 10366 32 165856K ldlm_locks
> >> >
> >> > 656896 656896 100% 0.12K 20528 32 82112K kmalloc-128
> >> >
> >> > 340200 339971 99% 0.19K 8100 42 64800K kmalloc-192
> >> >
> >> > 162838 162838 100% 0.30K 6263 26 50104K osc_object_kmem
> >> >
> >> > 744192 744192 100% 0.06K 11628 64 46512K kmalloc-64
> >> >
> >> > 205128 205128 100% 0.19K 4884 42 39072K dentry
> >> >
> >> > 4268 4256 99% 8.00K 1067 4 34144K kmalloc-8192
> >> >
> >> > 162978 162978 100% 0.17K 3543 46 28344K vvp_object_kmem
> >> >
> >> > 162792 162792 100% 0.16K 6783 24 27132K
> >> kvm_mmu_page_header
> >> >
> >> > 162825 162825 100% 0.16K 6513 25 26052K sigqueue
> >> >
> >> > 16368 16368 100% 1.02K 528 31 16896K nfs_inode_cache
> >> >
> >> > 20385 20385 100% 0.58K 755 27 12080K inode_cache
> >> >
> >> >
> >> > Repeat that for more (and bigger) directories and slab cache added up
> to
> >> > more than half the memory on this 24GB node.
> >> >
> >> > This is with CentOS-7.6 and lustre-2.10.5_ddn6.
> >> >
> >> > I worked around the problem by tackling the "ldlm_locks" memory usage
> >> with:
> >> > # lctl set_param ldlm.namespaces.lfs*.lru_max_age=10000
> >> >
> >> > ...but I did not find a way to reduce the "signal_cache".
> >> >
> >> > Regards,
> >> > Nathan
> >>
> >
> >
> > --
> > *Jacek Tomaka*
> > Geophysical Software Developer
> >
> >
> >
> >
> >
> >
> > *DownUnder GeoSolutions*
> > 76 Kings Park Road
> > West Perth 6005 WA, Australia
> > *tel *+61 8 9287 4143 <+61%208%209287%204143>
> > jacekt at dug.com
> > *www.dug.com <http://www.dug.com>*
>
--
*Jacek Tomaka*
Geophysical Software Developer
*DownUnder GeoSolutions*
76 Kings Park Road
West Perth 6005 WA, Australia
*tel *+61 8 9287 4143 <+61%208%209287%204143>
jacekt at dug.com
*www.dug.com <http://www.dug.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190429/4a4ae6c3/attachment-0001.html>
More information about the lustre-discuss
mailing list