[Lustre-discuss] Lustre client memory usage very high
Guillaume Demillecamps
guillaume at multipurpose.be
Wed Jul 22 02:45:49 PDT 2009
Hello people,
Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10
SP2 with un-patched kernel on the clients. I however has put the same
kernel revision downloaded from suse.com on the clients as the version
used in the Lustre-patched MGS/MDS/OSS servers. File system is only
several GBs, with ~500000 files. All inter-connections are through TCP.
We have some “manual” replication of an active lustre file system to a
passive lustre file system. We have “sync” clients that just basically
mount both file systems and run large sync jobs from the active Lustre
to the passive Lustre. So far, so good (apart that it is quite a slow
process). However my issue is that Lustre is rising memory so high
that rsync cannot get enough RAM to finish its job before kswap kicks
in and slows things down drastically.
Up to now, I have succeeded fine-tuning things using the following
steps in my rsync script:
########
umount /opt/lustre_a
umount /opt/lustre_z
mount /opt/lustre_a
mount /opt/lustre_z
for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ; done
for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo
30 > $i ; done
for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 > $i ; done
echo 64 > /proc/sys/lustre/max_dirty_mb
lctl set_param ldlm.namespaces.*osc*.lru_size=100
sysctl -w lnet.debug=0
########
What I still don't understand is that even when putting a max limit of
a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the
write-cache (lru_max_age ? is it correct ?) to a very limited number,
it still sky-rise to several GBs in /proc/sys/lustre/mem_used ? And as
soon as I un-mount the disks, it drops. The memused number however
will not decrease even if the client remains idle for several days
with no i/o from/to any lustre file systems. Note that cutting the
rsync jobs in smaller but more numbered jobs is not helping. Unless
I'd start un-mounting and re-mounting the lustre file systems between
each job (which is nevertheless what I may have to plan if there is no
further parameter which would help me) !
Any help/guidance/hint/... is very much appreciated.
Thank you,
Guillaume Demillecamps
More information about the lustre-discuss
mailing list