[Lustre-discuss] page allocation failure

Andreas Dilger adilger at sun.com
Thu Nov 27 19:42:47 PST 2008


On Nov 28, 2008  10:16 +0800, Wang lu wrote:
>   Thank you very much for your suggestion.   I have one more question. We 
> have more then 100 client nodes running 32bit
> Linux, when we switch the OSS kernel to 64bit, is there any special
> configuration we should do?

No, there is no requirement for the kernels on the OSS and clients to match.

> Andreas Dilger 写:
>
>> On Nov 26, 2008  19:04 +0800, Wang lu wrote:
>>> The %util of memory on OSS was always around 10% ,even when OSS was going to
>>> die.  The OSS kernel is:  2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit)
>>>
>>> Lustre version is 1.6.5.1
>>>
>>> We have 8GB physical memory and 16GB(never been used) swap total. 
>>>
>>> Is there a problem with memory management?
>>
>> The problem is with the 32-bit kernel.  Linux doesn't allow a 32-bit
>> kernel to use more than 900MB of memory on a 32-bit system, no matter
>> how much RAM is installed.  900MB/8192MB ~= 10% of RAM.  Swap is not
>> useful for the kernel.
>>
>>> Nov 20 19:40:45 boss02 kernel: Normal: 640*4kB 109*8kB 127*16kB 60*32kB 0*
> 64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7384kB
>>> Nov 20 19:40:45 boss02 kernel: HighMem: 376*4kB 1162*8kB 815*16kB 299*32kB
> 160*64kB 61*128kB 34*256kB 25*512kB 8*1024kB 1*2048kB 1786*4096kB = 7398656kB
>>
>> As you can see, all of the memory is available in "highmem" and not in
>> the "normal" memory region that the kernel uses.
>>
>>> Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation 
>>> failure. 
> order:4, mode:0x50
>>
>> These are "order 4" allocations (64kB), which the kernel is bad at handling
>> under memory pressure in any case.  You can see in the "Normal" zone above
>> that all memory chunks 64kB and larger have no free memory to allocate.
>>
>>> Nov 20 19:40:46 boss02 kernel:  [<c02b162a>] tcp_v4_do_rcv+0x1b/0xe9
>>> Nov 20 19:40:46 boss02 kernel:  [<fb18fd06>] ost_handle+0xe56/0x5790 
>>
>> This appears that the memory allocation problems are due to the TCP
>> stack.  I would suspect that you are using TCP with jumbo packets.
>>
>> The easiest solution is to run a 64-bit kernel, which I suspect should
>> be possible given that hardly any 32-bit machines allow more than 4GB
>> of RAM.  Next it would be possible to use regular ethernet frames, which
>> may help somewhat but it won't let you use the other 7GB of RAM in the
>> system.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list