[Lustre-discuss] Can't increase effective client read cache

Thu Sep 26 08:21:13 PDT 2013

> -----Original Message-----
> From: Dilger, Andreas [mailto:andreas.dilger at intel.com]
> Sent: Wednesday, September 25, 2013 10:03 AM
> To: Carlson, Timothy S
> Cc: lustre-discuss at lists.lustre.org; hpdd-discuss at lists.01.org
> Subject: Re: [Lustre-discuss] Can't increase effective client read cache
> 
> On 2013-09-24, at 14:15, "Carlson, Timothy S" <Timothy.Carlson at pnnl.gov>
> wrote:
> 
> > I've got an odd situation that I can't seem to fix.
> >
> > My setup is Lustre 1.8.8-wc1 clients on RHEL 6 talking to 1.8.6 servers on RHEL
> 5.
> >
> > My compute nodes have 64 GB of memory and I have a use case where an
> application has very low memory usage and needs to access a few thousand
> files in Lustre that range from 10 to 50 MB.  The files are subject to some reuse
> and it would be advantageous to cache as much of the data as possible.  The
> default cache for this configuration would be 48GB on the client as that is 75%
> of memory.   However the client never caches more than about 40GB of data
> according to /proc/meminfo
> >
> > Even if I tune the cached memory to 64GB the amount of cache in use never
> goes past 40GB. My current setting is as follows
> >
> > # lctl get_param llite.*.max_cached_mb
> > llite.olympus-ffff8804069da800.max_cached_mb=64000
> >
> > I've also played with some of the VM tunable settings.  Like running
> vfs_cache_pressure down to 10
> >
> > # vm.vfs_cache_pressure = 10
> >
> > In no case do I see more than about 35GB of cache being used.   To do some
> more testing on this I created a bunch (40) 2G files in Lustre and then copied
> them to /dev/null on the client. While doing this I ran the fincore tool from
> http://code.google.com/p/linux-ftools/ to see if the file was still in cache. Once
> about 40GB of cache was used, the kernel started to drop files from the cache
> even though there was no memory pressure on the system.
> >
> > If I do the same test with files local to the system, I can fill all the cache to
> about 61GB before files start getting dropped.
> >
> > Is there some other Lustre tunable on the client that I can twiddle with to
> make more use of the local memory cache?
> 
> This might relate to the number of DLM locks cached on the client. Of the locks
> get cancelled for some reason (e.g. memory pressure on the server, old age)
> then the pages covered by the locks will also be dropped.
> 
> You could try disabling the lock LRU and specify some large static number of
> locks (for testing, I wouldn't leave this set for production systems with large
> numbers of clients):
> 
>     lctl set_param ldlm.namespaces.*.lru_size=10000
> 
> To reset it to dynamic DLM LRU size management set a value of "0".
> 
> Cheers, Andread

I gave that a try but it didn't seem to help. On my generic example of copying 2G files to /dev/null, the lru_size of all the OSTs is under 10 except for 3 that I have permanently marked as inactive (and they are at 3200) and the MDS which is at 143. Here I dd'ed 20 2G files into /dev/null but only about 30GB is still in cache. 

# lctl get_param ldlm.namespaces.*.lru_size | awk -F\= '{print $2}' | sort -n | uniq -c
     29 0
    129 1
     73 2
     32 3
      4 4
      1 5
      2 9
      1 423
      3 3200

Any other thoughts on parameters to twiddle?

Thanks!

Tim