[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???

Wed Jun 10 17:10:44 PDT 2009

On Jun 04, 2009  17:52 -0500, Andrea Rucks wrote:
> What I'm seeing is that WPS eventually takes all 15.5 GB of available 
> memory (or tries to) and then my server will hang and show an out of 
> memory error on the console:
> 
> Call Trace:
>  [<ffffffff802bc998>] out_of_memory+0x8b/0x203
>                     Free pages:        6040kB (0kB HighMem)
> active:8348340kB inactive:7937820kB present:16785408kB 

So, about 8GB is just in cached memory, but is inactive so it should
be able to be released under memory pressure.

> pages_scanned:375028407 all_unreclaimable? yes
>                     lowmem_reserve[]: 0 0 0 0
>                     DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB 
> inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no

This _should_ mean that some pages are reclaimable, not sure why they
are not.

> If we stop WPS as it begins chewing through RAM, we still see a lot of 
> memory in use (Lustre client cache).  As I unmount each Lustre filesystem, 
> I gain back a significant portion of memory (about 7 GB back total).

That isn't indicative of anything, because Linux/Lustre caches data that
isn't in use (inactive) in case it might be used later.

> I'd like to now limit the Lustre clients to the following, but I'm not
> sure if  doing so will mess things up:
> 
> lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048  # Lustre 
> Default is 12288

Note: you can use "lctl set_param llite.*.max_cached_mb=2048" as a
shortcut for this.  Note that having many separate caches (i.e.
multiple filesystems) is less efficient than a single large filesystem.

> So, here are my questions.  Why is 75% the default for max_cached_mb?

Just a reasonable maximum amount of cached data.  Something has to be
kept available for application use.

> What will happen if I "lctl set_param llite.*.max_cached_mb 2048" instead
> of 12288 where it is today for each of those filesystems mentioned above?

It should cap the cached data at 2GB per filesystem.

> How am I affecting the performance of the client by making that change?

Depends on how much they re-use data.

> Is this a bad thing to do or no big deal?

For Lustre, no big deal.  Depends again on how much cached data affects
your application performance.

> Some filesystems are more heavily used than others, should I give them
> more memory?

Seems reasonable.

> Some filesystems have large files that I'm sure end up sitting up in
> memory, should I give them more memory?

Depends if your application re-uses files or not.

> I know the ldlm lru_size can be used to flush cache, but I don't 
> think that's a wise thing to do, people might lose...locks (true/false?) 

Clearing all of the locks will in turn flush all of your caches, so it
is only a short-term fix unless you put a hard limit on the number of
locks for each filesystem.  Getting that right is hard.

> on files they're downloading or something, right?  Is there another cache 
> tunable where I can flush cached things that are two hours or more old, 
> but leave the newer stuff (a max_cache_time parameter perhaps)?

Yes, there is the ldlm.namespaces.*.lru_max_age parameter you could tune.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.