[lustre-discuss] Lustre client memory and MemoryAvailable
NeilBrown
neilb at suse.com
Sun Apr 28 22:39:01 PDT 2019
Thanks Jacek,
so lustre_inode_cache is the real culprit when signal_cache appears to
be large.
This cache is slaved on the common inode cache, so there should be one
entry for each lustre inode that is in memory.
These inodes should get pruned when they've been inactive for a while.
If you look in /proc/sys/fs/inode-nr there should be two numbers:
The first is the total number of in-memory inodes for all filesystems.
The second is the number of "unused" inodes.
When you write "3" to drop_caches, the second number should drop down to
nearly zero (I get 95 on my desktop, down from 6524).
When signal_cache stays large even after the drop_caches, it suggest
that there are lots of lustre inodes that are thought to be still
active. I'd have to do a bit of digging to understand what that means,
and a lot more to work out why lustre is holding on to inodes longer
than you would expect (if that actually is the case).
If an inode still has cached data pages attached that cannot easily be
removed, it will not be purged even if it is unused.
So if you see the "unused" number remaining high even after a
"drop_caches", that might mean that lustre isn't letting go of cache
pages for some reason.
NeilBrown
On Mon, Apr 29 2019, Jacek Tomaka wrote:
> Wow, Thanks Nathan and NeilBrown.
> It is great to learn about slub merging. It is awesome to have a
> reproducer.
> I am yet to trigger my original problem with slurm_nomerge but
> slabinfo tool (in kernel sources) can actually show merged caches:
> kernel/3.10.0-693.5.2.el7/tools/slabinfo -a
>
> :t-0000112 <- sysfs_dir_cache kernfs_node_cache blkdev_integrity
> task_delay_info
> :t-0000144 <- flow_cache cl_env_kmem
> :t-0000160 <- sigqueue lov_object_kmem
> :t-0000168 <- lovsub_object_kmem osc_extent_kmem
> :t-0000176 <- vvp_object_kmem nfsd4_stateids
> :t-0000192 <- ldlm_resources kiocb cred_jar inet_peer_cache key_jar
> file_lock_cache kmalloc-192 dmaengine-unmap-16 bio_integrity_payload
> :t-0000216 <- vvp_session_kmem vm_area_struct
> :t-0000256 <- biovec-16 ip_dst_cache bio-0 ll_file_data kmalloc-256
> sgpool-8 filp request_sock_TCP rpc_tasks request_sock_TCPv6
> skbuff_head_cache pool_workqueue lov_thread_kmem
> :t-0000264 <- osc_lock_kmem numa_policy
> :t-0000328 <- osc_session_kmem taskstats
> :t-0000576 <- kioctx xfrm_dst_cache vvp_thread_kmem
> :t-0001152 <- signal_cache lustre_inode_cache
>
> It is not on a machine that had the problem i described before but the
> kernel version is the same so I am assuming the cache merges are the same.
>
> Looks like signal_cache points to lustre_inode_cache.
> Regards.
> Jacek Tomaka
>
>
> On Thu, Apr 25, 2019 at 7:42 AM NeilBrown <neilb at suse.com> wrote:
>
>>
>> Hi,
>> you seem to be able to reproduce this fairly easily.
>> If so, could you please boot with the "slub_nomerge" kernel parameter
>> and then reproduce the (apparent) memory leak.
>> I'm hoping that this will show some other slab that is actually using
>> the memory - a slab with very similar object-size to signal_cache that
>> is, by default, being merged with signal_cache.
>>
>> Thanks,
>> NeilBrown
>>
>>
>> On Wed, Apr 24 2019, Nathan Dauchy - NOAA Affiliate wrote:
>>
>> > On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka <jacekt at dug.com> wrote:
>> >
>> >>
>> >> >signal_cache should have one entry for each process (or thread-group).
>> >>
>> >> That is what i thought as well, looking at the kernel source,
>> allocations
>> >> from
>> >> signal_cache happen only during fork.
>> >>
>> >>
>> > I was recently chasing an issue with clients suffering from low memory
>> and
>> > saw that "signal_cache" was a major player. But the workload on those
>> > clients was not doing a lot of forking. (and I don't *think* threading
>> > either) Rather it was a LOT of metadata read operations.
>> >
>> > You can see the symptoms by a simple "du" on a Lustre file system:
>> >
>> > # grep signal_cache /proc/slabinfo
>> > signal_cache 967 1092 1152 28 8 : tunables 0 0
>> 0
>> > : slabdata 39 39 0
>> >
>> > # du -s /mnt/lfs1/projects/foo
>> > 339744908 /mnt/lfs1/projects/foo
>> >
>> > # grep signal_cache /proc/slabinfo
>> > signal_cache 164724 164724 1152 28 8 : tunables 0 0
>> 0
>> > : slabdata 5883 5883 0
>> >
>> > # slabtop -s c -o | head -n 20
>> > Active / Total Objects (% used) : 3660791 / 3662863 (99.9%)
>> > Active / Total Slabs (% used) : 93019 / 93019 (100.0%)
>> > Active / Total Caches (% used) : 72 / 107 (67.3%)
>> > Active / Total Size (% used) : 836474.91K / 837502.16K (99.9%)
>> > Minimum / Average / Maximum Object : 0.01K / 0.23K / 12.75K
>> >
>> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>> >
>> > 164724 164724 100% 1.12K 5883 28 188256K signal_cache
>> >
>> > 331712 331712 100% 0.50K 10366 32 165856K ldlm_locks
>> >
>> > 656896 656896 100% 0.12K 20528 32 82112K kmalloc-128
>> >
>> > 340200 339971 99% 0.19K 8100 42 64800K kmalloc-192
>> >
>> > 162838 162838 100% 0.30K 6263 26 50104K osc_object_kmem
>> >
>> > 744192 744192 100% 0.06K 11628 64 46512K kmalloc-64
>> >
>> > 205128 205128 100% 0.19K 4884 42 39072K dentry
>> >
>> > 4268 4256 99% 8.00K 1067 4 34144K kmalloc-8192
>> >
>> > 162978 162978 100% 0.17K 3543 46 28344K vvp_object_kmem
>> >
>> > 162792 162792 100% 0.16K 6783 24 27132K
>> kvm_mmu_page_header
>> >
>> > 162825 162825 100% 0.16K 6513 25 26052K sigqueue
>> >
>> > 16368 16368 100% 1.02K 528 31 16896K nfs_inode_cache
>> >
>> > 20385 20385 100% 0.58K 755 27 12080K inode_cache
>> >
>> >
>> > Repeat that for more (and bigger) directories and slab cache added up to
>> > more than half the memory on this 24GB node.
>> >
>> > This is with CentOS-7.6 and lustre-2.10.5_ddn6.
>> >
>> > I worked around the problem by tackling the "ldlm_locks" memory usage
>> with:
>> > # lctl set_param ldlm.namespaces.lfs*.lru_max_age=10000
>> >
>> > ...but I did not find a way to reduce the "signal_cache".
>> >
>> > Regards,
>> > Nathan
>>
>
>
> --
> *Jacek Tomaka*
> Geophysical Software Developer
>
>
>
>
>
>
> *DownUnder GeoSolutions*
> 76 Kings Park Road
> West Perth 6005 WA, Australia
> *tel *+61 8 9287 4143 <+61%208%209287%204143>
> jacekt at dug.com
> *www.dug.com <http://www.dug.com>*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190429/84fead2b/attachment.sig>
More information about the lustre-discuss
mailing list