<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hello Andreas,</p>
<p>Thank you for your prompt response. In the end I was also
thinking about a hardware issue. I will try to play with the DIMMs
and will be sure to get back to you if the the issue is resolved.</p>
<p>Cheers, Julien.<br>
</p>
<div class="moz-cite-prefix">Le 30/10/2021 à 02:46, Andreas Dilger a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:1A43B64B-7498-44B3-AB1D-69FFAA195F75@ddn.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
On Oct 29, 2021, at 07:39, Julien Rey via lustre-discuss <<a
href="mailto:lustre-discuss@lists.lustre.org" class=""
moz-do-not-send="true">lustre-discuss@lists.lustre.org</a>>
wrote:<br class="">
<div>
<blockquote type="cite" class=""><br
class="Apple-interchange-newline">
<div class="">
<div class="">Hello,<br class="">
<br class="">
This may not be related directly to Lustre, but here's
what I get when I try to mount our Lustre filesystem on
one of our compute node running CentOS 7:<br class="">
<br class="">
<br class="">
Oct 29 14:30:20 gpu-node8 kernel: SLUB: Unable to allocate
memory on node -1 (gfp=0x8050)<br class="">
</div>
</div>
</blockquote>
<div><br class="">
</div>
There doesn't look to be anything "wrong" here, -1 means "no
specific node", and the GFP mask is __GFP_ZERO | __GFP_IO |
__GFP_WAIT for this kernel.</div>
<div><br class="">
</div>
<div>One time I saw problems like this, it was because all the
DIMMs were installed on one socket of a dual-socket NUMA
motherboard, and no memory was available on the other socket,
but only some allocations failed.</div>
<div><br class="">
</div>
<div>Cheers, Andreas</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="">Oct 29 14:30:20 gpu-node8 kernel: cache:
dm_rq_target_io, object size: 136, buffer size: 136,
default order: 0, min order: 0<br class="">
Oct 29 14:30:20 gpu-node8 kernel: node 1: slabs: 2, objs:
60, free: 0<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3097:0:(niobuf.c:994:ptlrpc_register_rqbd()) LNetMDAttach
failed: -12;<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3097:0:(service.c:2551:ptlrpc_main()) Failed to post rqbd
for ldlm_cbd on CPT 0: -1<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(service.c:2917:ptlrpc_start_threads()) cannot
start ldlm_cb thread #0_0: rc -1<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(service.c:837:ptlrpc_register_service()) Failed to
start threads for service ldlm_cbd: -1<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(ldlm_lockd.c:3077:ldlm_setup()) failed to start
service<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(ldlm_lib.c:462:client_obd_setup()) ldlm_get_ref
failed: -1<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(obd_config.c:559:class_setup()) setup
MGC10.0.1.70@tcp failed (-1)<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(obd_mount.c:202:lustre_start_simple())
MGC10.0.1.70@tcp setup error -1<br class="">
Oct 29 14:30:20 gpu-node8 kernel: LustreError:
3091:0:(obd_mount.c:1608:lustre_fill_super()) Unable to
mount (-1)<br class="">
<br class="">
<br class="">
I've been scratching my head on this one because this
could just be a kernel bug but we have 3 other identical
servers running the exact same versions of CentOS 7 and
Lustre client and I got no problem with them.<br class="">
<br class="">
Some more info:<br class="">
<br class="">
[root@gpu-node8 ~]# uname -r<br class="">
3.10.0-1160.el7.x86_64<br class="">
<br class="">
[root@gpu-node8 ~]# lctl --version<br class="">
lctl 2.12.7<br class="">
<br class="">
[root@gpu-node8 ~]# vmstat -m |grep dm_rq_target_io<br
class="">
dm_rq_target_io 60 60 136 30<br
class="">
<br class="">
[root@gpu-node8 ~]# free -h<br class="">
total used free shared
buff/cache available<br class="">
Mem: 31G 1.4G 29G 10M
117M 29G<br class="">
Swap: 15G 0B 15G<br class="">
<br class="">
<br class="">
I've been playing with the sysctl parameters but I don't
really know what I'm doing and got no result anyway:<br
class="">
<br class="">
sysctl vm.overcommit_memory=1<br class="">
<br class="">
sysctl vm.min_free_kbytes=90112<br class="">
<br class="">
sysctl vm.overcommit_kbytes=90112<br class="">
<br class="">
<br class="">
Any help would be greetly appreciated.<br class="">
<br class="">
Thanks!<br class="">
<br class="">
-- <br class="">
Julien REY<br class="">
<br class="">
Plate-forme RPBS<br class="">
Modélisation Computationnelle des Interactions
Protéines-Ligand (CMPLI)<br class="">
Université de Paris<br class="">
tel : 01 57 27 83 95<br class="">
<br class="">
_______________________________________________<br
class="">
lustre-discuss mailing list<br class="">
<a href="mailto:lustre-discuss@lists.lustre.org" class=""
moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br
class="">
</div>
</div>
</blockquote>
</div>
<br class="">
<div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0,
0, 0); letter-spacing: normal; text-align: start; text-indent:
0px; text-transform: none; white-space: normal; word-spacing:
0px; -webkit-text-stroke-width: 0px; text-decoration: none;
word-wrap: break-word; -webkit-nbsp-mode: space; line-break:
after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color:
rgb(0, 0, 0); letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; word-wrap: break-word;
-webkit-nbsp-mode: space; line-break: after-white-space;"
class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color:
rgb(0, 0, 0); letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;
text-decoration: none; word-wrap: break-word;
-webkit-nbsp-mode: space; line-break: after-white-space;"
class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color:
rgb(0, 0, 0); letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; word-spacing: 0px; -webkit-text-stroke-width:
0px; text-decoration: none; word-wrap: break-word;
-webkit-nbsp-mode: space; line-break:
after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color:
rgb(0, 0, 0); letter-spacing: normal; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration: none;
word-wrap: break-word; -webkit-nbsp-mode: space;
line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0);
color: rgb(0, 0, 0); letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; text-decoration:
none; word-wrap: break-word; -webkit-nbsp-mode:
space; line-break: after-white-space;" class="">
<div>Cheers, Andreas</div>
<div>--</div>
<div>Andreas Dilger</div>
<div>Lustre Principal Architect</div>
<div>Whamcloud</div>
<div><br class="">
</div>
<div><br class="">
</div>
<div><br class="">
</div>
</div>
</div>
</div>
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br class="">
</blockquote>
<pre class="moz-signature" cols="72">--
Julien REY
Plate-forme RPBS
Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
Université de Paris
tel : 01 57 27 83 95</pre>
</body>
</html>