[lustre-discuss] SLUB: Unable to allocate memory on node -1
Julien Rey
julienrey76 at gmail.com
Sat Oct 30 03:03:33 PDT 2021
Hello Andreas,
Thank you for your prompt response. In the end I was also thinking about
a hardware issue. I will try to play with the DIMMs and will be sure to
get back to you if the the issue is resolved.
Cheers, Julien.
Le 30/10/2021 à 02:46, Andreas Dilger a écrit :
> On Oct 29, 2021, at 07:39, Julien Rey via lustre-discuss
> <lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>> wrote:
>>
>> Hello,
>>
>> This may not be related directly to Lustre, but here's what I get
>> when I try to mount our Lustre filesystem on one of our compute node
>> running CentOS 7:
>>
>>
>> Oct 29 14:30:20 gpu-node8 kernel: SLUB: Unable to allocate memory on
>> node -1 (gfp=0x8050)
>
> There doesn't look to be anything "wrong" here, -1 means "no specific
> node", and the GFP mask is __GFP_ZERO | __GFP_IO | __GFP_WAIT for this
> kernel.
>
> One time I saw problems like this, it was because all the DIMMs were
> installed on one socket of a dual-socket NUMA motherboard, and no
> memory was available on the other socket, but only some allocations
> failed.
>
> Cheers, Andreas
>
>> Oct 29 14:30:20 gpu-node8 kernel: cache: dm_rq_target_io, object
>> size: 136, buffer size: 136, default order: 0, min order: 0
>> Oct 29 14:30:20 gpu-node8 kernel: node 1: slabs: 2, objs: 60, free: 0
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3097:0:(niobuf.c:994:ptlrpc_register_rqbd()) LNetMDAttach failed: -12;
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3097:0:(service.c:2551:ptlrpc_main()) Failed to post rqbd for
>> ldlm_cbd on CPT 0: -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(service.c:2917:ptlrpc_start_threads()) cannot start ldlm_cb
>> thread #0_0: rc -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(service.c:837:ptlrpc_register_service()) Failed to start
>> threads for service ldlm_cbd: -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(ldlm_lockd.c:3077:ldlm_setup()) failed to start service
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(ldlm_lib.c:462:client_obd_setup()) ldlm_get_ref failed: -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(obd_config.c:559:class_setup()) setup MGC10.0.1.70 at tcp failed
>> (-1)
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(obd_mount.c:202:lustre_start_simple()) MGC10.0.1.70 at tcp setup
>> error -1
>> Oct 29 14:30:20 gpu-node8 kernel: LustreError:
>> 3091:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-1)
>>
>>
>> I've been scratching my head on this one because this could just be a
>> kernel bug but we have 3 other identical servers running the exact
>> same versions of CentOS 7 and Lustre client and I got no problem with
>> them.
>>
>> Some more info:
>>
>> [root at gpu-node8 ~]# uname -r
>> 3.10.0-1160.el7.x86_64
>>
>> [root at gpu-node8 ~]# lctl --version
>> lctl 2.12.7
>>
>> [root at gpu-node8 ~]# vmstat -m |grep dm_rq_target_io
>> dm_rq_target_io 60 60 136 30
>>
>> [root at gpu-node8 ~]# free -h
>> total used free shared buff/cache
>> available
>> Mem: 31G 1.4G 29G 10M 117M 29G
>> Swap: 15G 0B 15G
>>
>>
>> I've been playing with the sysctl parameters but I don't really know
>> what I'm doing and got no result anyway:
>>
>> sysctl vm.overcommit_memory=1
>>
>> sysctl vm.min_free_kbytes=90112
>>
>> sysctl vm.overcommit_kbytes=90112
>>
>>
>> Any help would be greetly appreciated.
>>
>> Thanks!
>>
>> --
>> Julien REY
>>
>> Plate-forme RPBS
>> Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
>> Université de Paris
>> tel : 01 57 27 83 95
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
--
Julien REY
Plate-forme RPBS
Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
Université de Paris
tel : 01 57 27 83 95
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211030/d44f945d/attachment-0001.html>
More information about the lustre-discuss
mailing list