[lustre-discuss] SLUB: Unable to allocate memory on node -1

Julien Rey julienrey76 at gmail.com
Sat Oct 30 03:03:33 PDT 2021


Hello Andreas,

Thank you for your prompt response. In the end I was also thinking about 
a hardware issue. I will try to play with the DIMMs and will be sure to 
get back to you if the the issue is resolved.

Cheers, Julien.

Le 30/10/2021 à 02:46, Andreas Dilger a écrit :
> On Oct 29, 2021, at 07:39, Julien Rey via lustre-discuss 
> <lustre-discuss at lists.lustre.org 
> <mailto:lustre-discuss at lists.lustre.org>> wrote:
>>
>> Hello,
>>
>> This may not be related directly to Lustre, but here's what I get 
>> when I try to mount our Lustre filesystem on one of our compute node 
>> running CentOS 7:
>>
>>
>> Oct 29 14:30:20 gpu-node8 kernel: SLUB: Unable to allocate memory on 
>> node -1 (gfp=0x8050)
>
> There doesn't look to be anything "wrong" here, -1 means "no specific 
> node", and the GFP mask is __GFP_ZERO | __GFP_IO | __GFP_WAIT for this 
> kernel.
>
> One time I saw problems like this, it was because all the DIMMs were 
> installed on one socket of a dual-socket NUMA motherboard, and no 
> memory was available on the other socket, but only some allocations 
> failed.
>
> Cheers, Andreas
>
>> Oct 29 14:30:20 gpu-node8 kernel:  cache: dm_rq_target_io, object 
>> size: 136, buffer size: 136, default order: 0, min order: 0
>> Oct 29 14:30:20 gpu-node8 kernel:  node 1: slabs: 2, objs: 60, free: 0
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3097:0:(niobuf.c:994:ptlrpc_register_rqbd()) LNetMDAttach failed: -12;
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3097:0:(service.c:2551:ptlrpc_main()) Failed to post rqbd for 
>> ldlm_cbd on CPT 0: -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(service.c:2917:ptlrpc_start_threads()) cannot start ldlm_cb 
>> thread #0_0: rc -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(service.c:837:ptlrpc_register_service()) Failed to start 
>> threads for service ldlm_cbd: -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(ldlm_lockd.c:3077:ldlm_setup()) failed to start service
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(ldlm_lib.c:462:client_obd_setup()) ldlm_get_ref failed: -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(obd_config.c:559:class_setup()) setup MGC10.0.1.70 at tcp failed 
>> (-1)
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(obd_mount.c:202:lustre_start_simple()) MGC10.0.1.70 at tcp setup 
>> error -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError: 
>> 3091:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-1)
>>
>>
>> I've been scratching my head on this one because this could just be a 
>> kernel bug but we have 3 other identical servers running the exact 
>> same versions of CentOS 7 and Lustre client and I got no problem with 
>> them.
>>
>> Some more info:
>>
>> [root at gpu-node8 ~]# uname -r
>> 3.10.0-1160.el7.x86_64
>>
>> [root at gpu-node8 ~]# lctl --version
>> lctl 2.12.7
>>
>> [root at gpu-node8 ~]# vmstat -m |grep dm_rq_target_io
>> dm_rq_target_io              60     60    136     30
>>
>> [root at gpu-node8 ~]# free -h
>>               total        used        free      shared buff/cache   
>> available
>> Mem:            31G        1.4G         29G         10M 117M         29G
>> Swap:           15G          0B         15G
>>
>>
>> I've been playing with the sysctl parameters but I don't really know 
>> what I'm doing and got no result anyway:
>>
>> sysctl vm.overcommit_memory=1
>>
>> sysctl vm.min_free_kbytes=90112
>>
>> sysctl vm.overcommit_kbytes=90112
>>
>>
>> Any help would be greetly appreciated.
>>
>> Thanks!
>>
>> -- 
>> Julien REY
>>
>> Plate-forme RPBS
>> Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
>> Université de Paris
>> tel : 01 57 27 83 95
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
-- 
Julien REY

Plate-forme RPBS
Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
Université de Paris
tel : 01 57 27 83 95

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211030/d44f945d/attachment-0001.html>


More information about the lustre-discuss mailing list