[Lustre-discuss] Lustre client memory usage very high

Wed Jul 29 15:46:27 PDT 2009

On Jul 22, 2009  11:45 +0200, Guillaume Demillecamps wrote:
> Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10  
> SP2 with un-patched kernel on the clients. I however has put the same  
> kernel revision downloaded from suse.com on the clients as the version  
> used in the Lustre-patched MGS/MDS/OSS servers. File system is only  
> several GBs, with ~500000 files. All inter-connections are through TCP.
> 
> We have some “manual” replication of an active lustre file system to a  
> passive lustre file system. We have “sync” clients that just basically  
> mount both file systems and run large sync jobs from the active Lustre  
> to the passive Lustre. So far, so good (apart that it is quite a slow  
> process). However my issue is that Lustre is rising memory so high  
> that rsync cannot get enough RAM to finish its job before kswap kicks  
> in and slows things down drastically.
> Up to now, I have succeeded fine-tuning things using the following  
> steps in my rsync script:
>        ########
> 	umount /opt/lustre_a
> 	umount /opt/lustre_z
> 	mount /opt/lustre_a
> 	mount /opt/lustre_z
> 	for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ; done
> 	for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo  
> 30 > $i ; done
> 	for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 > $i ; done
> 	echo 64 > /proc/sys/lustre/max_dirty_mb

Note that you can do these more easily with

        lctl set_param osc.*.max_dirty_mb=4
        lctl set_param ldlm.namespaces.*.lru_max_age=30
        lctl set_param llite.*.max_cache_mb=64
        lctl set_param max_dirty_mb=64

> 	lctl set_param ldlm.namespaces.*osc*.lru_size=100
> 	sysctl -w lnet.debug=0

This can also be "lctl set_param debug=0".

> What I still don't understand is that even when putting a max limit of  
> a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the  
> write-cache (lru_max_age ? is it correct ?) to a very limited number,  
> it still sky-rise to several GBs in /proc/sys/lustre/mem_used ?

Can you please check /proc/slabinfo to see what kind of memory is being
allocated the most?  The max_cached_mb/max_dirty_mb are only limits on
the cached/dirty data pages, and not for metadata structures.  Also,
in 30s I expect you can have a LOT of inodes traversed, so that might
be your problem, and even then lock cancellation does not necessarily
force the kernel dentry/inode out of memory.

Getting total lock counts would also help:

	lctl get_param ldlm.namespaces.*.resource_count

You might be able to tweak some of the "normal" (not Lustre specific)
/proc parmeters to flush the inodes from cache more quickly, or increase
the rate at which kswapd is trying to flush unused inodes.

> And as soon as I un-mount the disks, it drops. The memused number however  
> will not decrease even if the client remains idle for several days  
> with no i/o from/to any lustre file systems. Note that cutting the  
> rsync jobs in smaller but more numbered jobs is not helping.

There is a test program called "memhog" that could force memory to be
flushed between jobs, but that is a sub-standard solution.

> Unless  
> I'd start un-mounting and re-mounting the lustre file systems between  
> each job (which is nevertheless what I may have to plan if there is no  
> further parameter which would help me) !

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.