[lustre-discuss] Lustre client memory and MemoryAvailable

NeilBrown neilb at suse.com
Sun Apr 28 22:39:01 PDT 2019


Thanks Jacek,
 so lustre_inode_cache is the real culprit when signal_cache appears to
 be large.
 This cache is slaved on the common inode cache, so there should be one
 entry for each lustre inode that is in memory.
 These inodes should get pruned when they've been inactive for a while.

 If you look in /proc/sys/fs/inode-nr  there should be two numbers:
  The first is the total number of in-memory inodes for all filesystems.
  The second is the number of "unused" inodes.

 When you write "3" to drop_caches, the second number should drop down to
 nearly zero (I get 95 on my desktop, down from 6524).

 When signal_cache stays large even after the drop_caches, it suggest
 that there are lots of lustre inodes that are thought to be still
 active.   I'd have to do a bit of digging to understand what that means,
 and a lot more to work out why lustre is holding on to inodes longer
 than you would expect (if that actually is the case).

 If an inode still has cached data pages attached that cannot easily be
 removed, it will not be purged even if it is unused.
 So if you see the "unused" number remaining high even after a
 "drop_caches", that might mean that lustre isn't letting go of cache
 pages for some reason.

NeilBrown

  

On Mon, Apr 29 2019, Jacek Tomaka wrote:

> Wow, Thanks Nathan and NeilBrown.
> It is great to learn about slub merging. It is awesome to have a
> reproducer.
> I am yet to trigger my original problem with slurm_nomerge but
> slabinfo tool (in kernel sources) can actually show merged caches:
> kernel/3.10.0-693.5.2.el7/tools/slabinfo  -a
>
> :t-0000112   <- sysfs_dir_cache kernfs_node_cache blkdev_integrity
> task_delay_info
> :t-0000144   <- flow_cache cl_env_kmem
> :t-0000160   <- sigqueue lov_object_kmem
> :t-0000168   <- lovsub_object_kmem osc_extent_kmem
> :t-0000176   <- vvp_object_kmem nfsd4_stateids
> :t-0000192   <- ldlm_resources kiocb cred_jar inet_peer_cache key_jar
> file_lock_cache kmalloc-192 dmaengine-unmap-16 bio_integrity_payload
> :t-0000216   <- vvp_session_kmem vm_area_struct
> :t-0000256   <- biovec-16 ip_dst_cache bio-0 ll_file_data kmalloc-256
> sgpool-8 filp request_sock_TCP rpc_tasks request_sock_TCPv6
> skbuff_head_cache pool_workqueue lov_thread_kmem
> :t-0000264   <- osc_lock_kmem numa_policy
> :t-0000328   <- osc_session_kmem taskstats
> :t-0000576   <- kioctx xfrm_dst_cache vvp_thread_kmem
> :t-0001152   <- signal_cache lustre_inode_cache
>
> It is not on a machine that had the problem i described before but the
> kernel version is the same so I am assuming the cache merges are the same.
>
> Looks like signal_cache points to lustre_inode_cache.
> Regards.
> Jacek Tomaka
>
>
> On Thu, Apr 25, 2019 at 7:42 AM NeilBrown <neilb at suse.com> wrote:
>
>>
>> Hi,
>>  you seem to be able to reproduce this fairly easily.
>>  If so, could you please boot with the "slub_nomerge" kernel parameter
>>  and then reproduce the (apparent) memory leak.
>>  I'm hoping that this will show some other slab that is actually using
>>  the memory - a slab with very similar object-size to signal_cache that
>>  is, by default, being merged with signal_cache.
>>
>> Thanks,
>> NeilBrown
>>
>>
>> On Wed, Apr 24 2019, Nathan Dauchy - NOAA Affiliate wrote:
>>
>> > On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka <jacekt at dug.com> wrote:
>> >
>> >>
>> >> >signal_cache should have one entry for each process (or thread-group).
>> >>
>> >> That is what i thought as well, looking at the kernel source,
>> allocations
>> >> from
>> >> signal_cache happen only during fork.
>> >>
>> >>
>> > I was recently chasing an issue with clients suffering from low memory
>> and
>> > saw that "signal_cache" was a major player.  But the workload on those
>> > clients was not doing a lot of forking.  (and I don't *think* threading
>> > either)  Rather it was a LOT of metadata read operations.
>> >
>> > You can see the symptoms by a simple "du" on a Lustre file system:
>> >
>> > # grep signal_cache /proc/slabinfo
>> > signal_cache         967   1092   1152   28    8 : tunables    0    0
>> 0
>> > : slabdata     39     39      0
>> >
>> > # du -s /mnt/lfs1/projects/foo
>> > 339744908 /mnt/lfs1/projects/foo
>> >
>> > # grep signal_cache /proc/slabinfo
>> > signal_cache      164724 164724   1152   28    8 : tunables    0    0
>> 0
>> > : slabdata   5883   5883      0
>> >
>> > # slabtop -s c -o | head -n 20
>> >  Active / Total Objects (% used)    : 3660791 / 3662863 (99.9%)
>> >  Active / Total Slabs (% used)      : 93019 / 93019 (100.0%)
>> >  Active / Total Caches (% used)     : 72 / 107 (67.3%)
>> >  Active / Total Size (% used)       : 836474.91K / 837502.16K (99.9%)
>> >  Minimum / Average / Maximum Object : 0.01K / 0.23K / 12.75K
>> >
>> >   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>> >
>> > 164724 164724 100%    1.12K   5883       28    188256K signal_cache
>> >
>> > 331712 331712 100%    0.50K  10366       32    165856K ldlm_locks
>> >
>> > 656896 656896 100%    0.12K  20528       32     82112K kmalloc-128
>> >
>> > 340200 339971  99%    0.19K   8100       42     64800K kmalloc-192
>> >
>> > 162838 162838 100%    0.30K   6263       26     50104K osc_object_kmem
>> >
>> > 744192 744192 100%    0.06K  11628       64     46512K kmalloc-64
>> >
>> > 205128 205128 100%    0.19K   4884       42     39072K dentry
>> >
>> >   4268   4256  99%    8.00K   1067        4     34144K kmalloc-8192
>> >
>> > 162978 162978 100%    0.17K   3543       46     28344K vvp_object_kmem
>> >
>> > 162792 162792 100%    0.16K   6783       24     27132K
>> kvm_mmu_page_header
>> >
>> > 162825 162825 100%    0.16K   6513       25     26052K sigqueue
>> >
>> >  16368  16368 100%    1.02K    528       31     16896K nfs_inode_cache
>> >
>> >  20385  20385 100%    0.58K    755       27     12080K inode_cache
>> >
>> >
>> > Repeat that for more (and bigger) directories and slab cache added up to
>> > more than half the memory on this 24GB node.
>> >
>> > This is with CentOS-7.6 and lustre-2.10.5_ddn6.
>> >
>> > I worked around the problem by tackling the "ldlm_locks" memory usage
>> with:
>> > # lctl set_param ldlm.namespaces.lfs*.lru_max_age=10000
>> >
>> > ...but I did not find a way to reduce the "signal_cache".
>> >
>> > Regards,
>> > Nathan
>>
>
>
> -- 
> *Jacek Tomaka*
> Geophysical Software Developer
>
>
>
>
>
>
> *DownUnder GeoSolutions*
> 76 Kings Park Road
> West Perth 6005 WA, Australia
> *tel *+61 8 9287 4143 <+61%208%209287%204143>
> jacekt at dug.com
> *www.dug.com <http://www.dug.com>*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190429/84fead2b/attachment.sig>


More information about the lustre-discuss mailing list