[lustre-discuss] Lustre client memory and MemoryAvailable

Tue Apr 16 17:05:35 PDT 2019

On Tue, Apr 16 2019, Jacek Tomaka wrote:

> I did :
> for i in {1..100}; do cat /proc/36960/stack >$i; sleep 1; done
> in one bash and on the other one(36960):
> time -p echo 3 >/proc/sys/vm/drop_caches
>
> It took about two minutes, unfortunately most of the time it claims that it
> was not doing anything kernel side:
> [<ffffffffffffffff>] 0xffffffffffffffff

This suggests that the time taken is largely CPU time - walking lists
and moving things around and freeing memory.
With a slow CPU and lots of memory 2 minutes might be a reasonable
number, though it seems a little long.

> with the exception of two files, at 32 sec and 73 sec:
> root at xxx xxx]# cat 32
> [<ffffffffc11231db>] cl_sync_file_range+0x2db/0x380 [lustre]
> [<ffffffffffffffff>] 0xffffffffffffffff

In this case the process probably was scheduled to another CPU while the
stack was being examined, so it suddenly became undefined and the
examination was aborted.

> [root at xxx xxx]# cat 73
> [<ffffffffc11231db>] cl_sync_file_range+0x2db/0x380 [lustre]
> [<ffffffffc11330f6>] ll_delete_inode+0xa6/0x1c0 [lustre]
> [<ffffffff8121d729>] evict+0xa9/0x180
> [<ffffffff8121d83e>] dispose_list+0x3e/0x50
> [<ffffffff8121e834>] prune_icache_sb+0x174/0x340
> [<ffffffff81203863>] prune_super+0x143/0x170
> [<ffffffff81195443>] shrink_slab+0x163/0x330
> [<ffffffff812655f3>] drop_caches_sysctl_handler+0xc3/0x120
> [<ffffffff8127c203>] proc_sys_call_handler+0xd3/0xf0
> [<ffffffff8127c234>] proc_sys_write+0x14/0x20
> [<ffffffff81200cad>] vfs_write+0xbd/0x1e0
> [<ffffffff81201abf>] SyS_write+0x7f/0xe0
> [<ffffffff816b5292>] tracesys+0xdd/0xe2
> [<ffffffffffffffff>] 0xffffffffffffffff

So 2 out of 100 checks were in cl_sync_file_range.  Maybe it is spending
2% of the time here???

Calling ll_delete_inode is very near the end of the processes of
removing an inode from the cache.
By the time it gets to this point there shouldn't be any dirty data in
the file, so calling cl_sync_file_range() seems odd.
The fact that it (apparently) needs to wait for something to happen
seems even more odd.
Maybe it is releasing some locks or something like that - I'd need to
dig through the code more to be sure.  I doubt this contributes a lot to
the time though.

Unfortunately I don't think we can gather more data without something
intrusive like kprobes...

>
> also after unmounting lustre fs and removing all modules i could relate to
> lustre i could still see some vvp_object_kmem, is it expected?
> [root at xxx xxx]# rmmod obdclass ptlrpc ksocklnd libcfs lnet lustre fid mdc
> osc cnetmgc fld lmv lov;
> rmmod: ERROR: Module obdclass is not currently loaded
> rmmod: ERROR: Module ptlrpc is not currently loaded
> rmmod: ERROR: Module ksocklnd is not currently loaded
> rmmod: ERROR: Module libcfs is not currently loaded
> rmmod: ERROR: Module lnet is not currently loaded
> rmmod: ERROR: Module lustre is not currently loaded
> rmmod: ERROR: Module fid is not currently loaded
> rmmod: ERROR: Module mdc is not currently loaded
> rmmod: ERROR: Module osc is not currently loaded
> rmmod: ERROR: Module cnetmgc is not currently loaded
> rmmod: ERROR: Module fld is not currently loaded
> rmmod: ERROR: Module lmv is not currently loaded
> rmmod: ERROR: Module lov is not currently loaded
> [root at xxx xxx]# cat /proc/slabinfo |grep vvp
> vvp_object_kmem    32982  33212    176   46    2 : tunables    0    0    0
> : slabdata    722    722      0
>

vvp_object_kmem is created by the 'lustre' module, and should be
discarded when that module is unloaded.
The only explanation I can find for it remaining around is if it was
shared.
If two different modules both request a slab cache which would use 46
entries across 2 pages - 175 to 178 bytes per object - then they will
both use the same slab cache and the name it has will be whichever name
was created first.
So maybe some other modules want a slab cache about that size.

The other possible explanation is that the cache wasn't empty when the
lustre module was unloaded.  In that case you would have seen a message:
  kmem_cache_destroy %s: Slab cache still has objects
together with a Call Trace.  That would indicate a problem in lustre.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190417/312f0d3f/attachment.sig>