[Lustre-discuss] Page allocation failure

Andreas Dilger adilger at sun.com
Fri Aug 28 01:30:41 PDT 2009


On Aug 28, 2009  15:22 +0800, Lu Wang wrote:
> We got a unusally  frequency of computing node crash this days, after we
> add 10 more OSS to present Lustre system. 
:
:
> Aug 27 11:50:46 bws0202 kernel: Normal free:864kB min:928kB low:1856kB high:2784kB active:89892kB inactive:25980kB present:9

Note very little "normal" memory is free.  This is the only memory that
the kernel can use for its own caching.

> Aug 27 11:50:46 bws0202 kernel: HighMem free:3890880kB min:512kB low:1024kB high:1536kB active:3051644kB inactive:8889744kB 

There is almost 4GB of highmem free, but it can't be used by kernel
allocations on 32-bit systems.

> The computing nodes are running "lustre-1.6.5-2.6.9_55.EL.cernsmp",
> with 16 GB memory on 32 bit OS. servers are running

You cannot use this memory with a 32-bit kernel.  Use a 64-bit kernel
instead.

> "2.6.9-67.0.22.EL_lustre.1.6.6smp" on 64 bit OS. Every computing nodes
> are mounting two lustre: one with 20 OSS, one with 2 OSS. I have set
> /proc/fs/lustre/llite/*/max_cached_mb=4158 for each Lustre file system.

This is far too much cache for a 32-bit client.

> Is it possible to control the Normal Memory a Lustre client used with certain tuning option?

Reduce the max_cached_mb to a much smaller value (e.g. 1GB) and it
may help avoid problems.

> Our server have experienced same problem when the OS of OSSes  are 32
> bit. After switched to 64 bit, the problem has not appreared any more.
> It is difficult for us to switch all computing nodes to 64 bit right now.

You can still run 32-bit applications with a 64-bit kernel, if that is
needed, as long as you also install the 32-bit userspace (libraries).
You need to install 64-bit lustre tools, but that should be fine.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list