[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???

Wed Jun 10 08:37:58 PDT 2009

Hi there,

I'm experiencing some issues with Lustre and WebSphere Portal Server 6.1 
(WPS) co-existing on the Lustre client application server.  WPS likes to 
use a lot of memory.  The server was originally allocated 16 GB of RAM. 
These servers are XEN virtualized on RHEL 5.3 running Lustre 1.6.7 patched 
16895 (but not fully 1.6.7.1).

What I'm seeing is that WPS eventually takes all 15.5 GB of available 
memory (or tries to) and then my server will hang and show an out of 
memory error on the console:

Call Trace:
 [<ffffffff802bc998>] out_of_memory+0x8b/0x203
  [<ffffffff8020f657>] __alloc_pages+0x245/0x2ce
   [<ffffffff8021336e>] __do_page_cache_readahead+0xd0/0x21c
    [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
     [<ffffffff8023efff>] lock_timer_base+0x1b/0x3c
      [<ffffffff88081d4d>] :dm_mod:dm_any_congested+0x38/0x3f
       [<ffffffff80213c47>] filemap_nopage+0x148/0x322
        [<ffffffff80208db9>] __handle_mm_fault+0x440/0x11f6
                  [<ffffffff802666ef>] do_page_fault+0xf7b/0x12e0
                   [<ffffffff80207141>] kmem_cache_free+0x80/0xd3
                    [<ffffffff8025f82b>] error_exit+0x0/0x6e

                    DMA per-cpu:
                    cpu 0 hot: high 186, batch 31 used:73
                    cpu 0 cold: high 62, batch 15 used:61
                    cpu 1 hot: high 186, batch 31 used:164
                    cpu 1 cold: high 62, batch 15 used:59
                    DMA32 per-cpu: empty
                    Normal per-cpu: empty
                    HighMem per-cpu: empty
                    Free pages:        6040kB (0kB HighMem)
                    Active:2087085 inactive:1984455 dirty:0 writeback:0 
unstable:0 free:1510 slab:9371 mapped-file:992 mapped-anon:4054270 
pagetables:11073
                    DMA free:6040kB min:16384kB low:20480kB high:24576kB 
active:8348340kB inactive:7937820kB present:16785408kB 
pages_scanned:375028407 all_unreclaimable? yes
                    lowmem_reserve[]: 0 0 0 0
                    DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    Normal free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    HighMem free:0kB min:128kB low:128kB high:128kB 
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    DMA: 0*4kB 3*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 
1*512kB 1*1024kB 0*2048kB 1*4096kB = 6040kB
                    DMA32: empty
                    Normal: empty
                    HighMem: empty
                    17533 pagecache pages
                    Swap cache: add 6497852, delete 6497594, find 
1492942/1923882, race 4+83
                    Free swap  = 0kB
                    Total swap = 4194296kB
                    uptimeagent invoked oom-killer: gfp_mask=0x201d2, 
order=0, oomkilladj=0

I have 6 filesystems of varying size (1.5 TB - 4 GB), and we use Lustre to 
share them amongst the WebSphere cluster; our use of Lustre is commercial 
in nature, non-HPC and uses legacy filesystem structures (still uses Linux 
HA though).

If we stop WPS as it begins chewing through RAM, we still see a lot of 
memory in use (Lustre client cache).  As I unmount each Lustre filesystem, 
I gain back a significant portion of memory (about 7 GB back total).  For 
grins, we ripple stopped each WPS server, adjusted the maxmem XEN value 
and gave each server an additional 6 GB of RAM for a total of 22 GB.  I'd 
like to now limit the Lustre clients to the following, but I'm not sure if 
doing so will mess things up:

lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs02*/max_cached_mb 2048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs03*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs04*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs05*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs06*/max_cached_mb 512    # 
Lustre Default is 12288

So, here are my questions.  Why is 75% the default for max_cached_mb? What 
will happen if I "lctl set_param 
/proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048" instead of 12288 where 
it is today for each of those filesystems mentioned above?  How am I 
affecting the performance of the client by making that change?   Is this a 
bad thing to do or no big deal?  Some filesystems are more heavily used 
than others, should I give them more memory?  Some filesystems have large 
files that I'm sure end up sitting up in memory, should I give them more 
memory?  I know the lmld lru_size can be used to flush cache, but I don't 
think that's a wise thing to do, people might lose...locks (true/false?) 
on files they're downloading or something, right?  Is there another cache 
tunable where I can flush cached things that are two hours or more old, 
but leave the newer stuff (a max_cache_time parameter perhaps)?

Cheers,

Ms. Andrea D. Rucks
Sr. Unix Systems Administrator,
Lawson ITS Unix Server Team
_____________________________

Lawson
380 St. Peter Street
St. Paul, MN 55102
Tel: 651-767-6252
http://www.lawson.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090610/17f8ae9c/attachment.htm>