<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I supposed you removed the /etc/modprobe.d/lustre.conf
completely.</p>
<p>I only have the lnet service enabled at startup, I do not start
any lustre3 service, but I am running lustre 2.12.0 sorry not 2.14</p>
<p>so something might be different.<br>
</p>
<p>Did you start over with a clean configuration ?<br>
</p>
<p>Did you reboot your system to make sure it picks up the new
config ? At least for me sometimes the lnet module does not unload
correctly.<br>
</p>
<p>Also I have to mention in my setup I did disable discovery also
on the OSSes not only client side.<br>
</p>
<p>Generally it is not advisable to disable Multi-rail unless you
have backward compatibility issues with older lustre peers.</p>
<p>But disabling discovery will also disable Multi-rail.<br>
</p>
<p>You can try with <br>
</p>
<p>lenetctl set discovery 0</p>
<p>as you already did,<br>
</p>
<p>then you do</p>
<p>lnetctl -b export > /etc/lnet.conf</p>
<p>check discovery is set to 0 in the file and if not edit it and
set it to 0.<br>
</p>
<p>reboot and see if things changes.</p>
<p>If anyway you did not define any tcp interface in lnet.conf you
should not see any tcp peers.<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 9/13/21 2:59 PM, Vicker, Darby J.
(JSC-EG111)[Jacobs Technology, Inc.] wrote:<br>
</div>
<blockquote type="cite"
cite="mid:C3E1F417-FD94-4C6E-9D64-FF13FFE5EABD@nasa.gov">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:Courier;
panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:"Times New Roman \(Body CS\)";
panose-1:2 11 6 4 2 2 2 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}div.WordSection1
{page:WordSection1;}</style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks
Rick. I removed my lnet modprobe options and adapted my
lnet.conf file to:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"># cat
/etc/lnet.conf
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">ip2nets:<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">- net-spec:
o2ib1<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"> interfaces:<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"> 0: ib0<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">global:<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"> discovery:
0<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Now "lnetctl
export" doesn't have any reference to NIDs on the other
networks, so that's good. However, I'm still seeing some
values that concern me:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"># lnetctl
export | grep -e Multi -e discover | sort -u<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"> discovery:
1<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">
Multi-Rail: True<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Any idea why
discovery is still 1 if I'm specifying that to 0 in the
lnet.conf file? I'm a little concerned that with Multi-Rail
still True and discovery on, the client could still find its
way back to the TCP route. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="color:black">From: </span></b><span
style="color:black">Riccardo Veraldi
<a class="moz-txt-link-rfc2396E" href="mailto:riccardo.veraldi@cnaf.infn.it"><riccardo.veraldi@cnaf.infn.it></a><br>
<b>Date: </b>Monday, September 13, 2021 at 3:16 PM<br>
<b>To: </b>"Vicker, Darby J. (JSC-EG111)[Jacobs
Technology, Inc.]" <a class="moz-txt-link-rfc2396E" href="mailto:darby.vicker-1@nasa.gov"><darby.vicker-1@nasa.gov></a>,
<a class="moz-txt-link-rfc2396E" href="mailto:lustre-discuss@lists.lustre.org">"lustre-discuss@lists.lustre.org"</a>
<a class="moz-txt-link-rfc2396E" href="mailto:lustre-discuss@lists.lustre.org"><lustre-discuss@lists.lustre.org></a><br>
<b>Subject: </b>[EXTERNAL] Re: [lustre-discuss] Disabling
multi-rail dynamic discovery<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<p>I would use configuration on /etc/lnet.conf and I would not
use anymore the older style configuration in
<o:p></o:p></p>
<p><span style="font-family:Courier">/etc/modprobe.d/lustre.conf
</span><o:p></o:p></p>
<p class="MsoNormal">for example in my /etc/lnet.conf
configuration I have: <o:p>
</o:p></p>
<p><b><span style="font-family:Courier">ip2nets:<br>
- net-spec: o2ib<br>
interfaces:<br>
0: ib0<br>
- net-spec: tcp<br>
interfaces:<br>
0: enp24s0f0<br>
global:<br>
discovery: 0</span></b><o:p></o:p></p>
<p>As I disabled the auto discovery.<o:p></o:p></p>
<p>Regarding ko2ib you can just use
/etc/modprobe.d/ko2iblnd.conf<o:p></o:p></p>
<p>Mine looks like this:<o:p></o:p></p>
<p><b><span style="font-family:"Courier New"">options
ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024
ntx=2048 map_on_demand=256 fmr_pool_size=2048
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4</span></b><o:p></o:p></p>
<p>Hope it helps.<o:p></o:p></p>
<p>Rick<o:p></o:p></p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 9/13/21 1:53 PM, Vicker, Darby J.
(JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-size:11.0pt">Hello,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I would
like to know how to turn off auto discovery of peers on a
client. This seems like it should be straight forward but
we can't get it to work. Please fill me in on what I'm
missing.
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We
recently upgraded our servers to 2.14. Our servers are
multi-homed (1 tcp network and 2 separate IB networks) but
we want them to be single rail. On one of our clusters we
are still using the 2.12.6 client and it uses one of the
IB networks for lustre. The modprobe file from one of the
client nodes:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"># cat
/etc/modprobe.d/lustre.conf
</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">options lnet
networks=o2ib1(ib0)</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">options
ko2iblnd map_on_demand=32</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">#</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The client
does have a route to the TCP network. This is intended to
allow jobs on the compute nodes to access licenese
servers, not for any serious I/O. We recently discovered
that due to some instability in the IB fabric, the client
was trying to fail over to tcp:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New""># dmesg | grep Lustre</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 250.205912] Lustre: Lustre: Build Version:
2.12.6</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 255.886086] Lustre: Mounted scratch-client</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 287.247547] Lustre:
3472:0:(client.c:2146:ptlrpc_expire_one_request()) @@@
Request sent has timed out for sent delay: [sent
1630699139/real 0] req@ffff98deb9358480
x1709911947878336/t0(0) o9-><a
href="mailto:hpfs-fsl-OST0001-osc-ffff9880cfb80000@192.52.98.33@tcp:28/4"
moz-do-not-send="true">hpfs-fsl-OST0001-osc-ffff9880cfb80000@192.52.98.33@tcp:28/4</a>
lens 224/224 e 0 to 1 dl 1630699145 ref 2 fl
Rpc:XN/0/ffffffff rc 0/-1</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 739.832744] Lustre:
3526:0:(client.c:2146:ptlrpc_expire_one_request()) @@@
Request sent has timed out for sent delay: [sent
1630699591/real 0] req@ffff98deb935da00
x1709911947883520/t0(0) o400-><a
href="mailto:scratch-MDT0000-mdc-ffff98b0f1fc0800@192.52.98.31@tcp:12/10"
moz-do-not-send="true">scratch-MDT0000-mdc-ffff98b0f1fc0800@192.52.98.31@tcp:12/10</a>
lens 224/224 e 0 to 1 dl 1630699598 ref 2 fl
Rpc:XN/0/ffffffff rc 0/-1</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 739.832755] Lustre:
3526:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped
5 previous similar messages</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 739.832762] LustreError: 166-1:
MGC10.150.100.30@o2ib1: Connection to MGS (at
192.52.98.30@tcp) was lost; in progress operations using
this service will fail</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 739.832769] Lustre:
hpfs-fsl-MDT0000-mdc-ffff9880cfb80000: Connection to
hpfs-fsl-MDT0000 (at 192.52.98.30@tcp) was lost; in
progress operations using this service will wait for
recovery to complete</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier
New"">[ 1090.978619] LustreError: 167-0:
scratch-MDT0000-mdc-ffff98b0f1fc0800: This client was
evicted by scratch-MDT0000; in progress operations using
this service will fail.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I'm pretty
sure this is due to the auto discovery. Again, from a
client:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<pre style="margin-left:.5in"># lnetctl export | grep -e Multi -e discover | sort -u<o:p></o:p></pre>
<pre style="margin-left:.5in"> discovery: 0<o:p></o:p></pre>
<pre style="margin-left:.5in"> Multi-Rail: True<o:p></o:p></pre>
<pre style="margin-left:.5in"># <o:p></o:p></pre>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We want to
restrict lustre to only the IB NID but its not clear
exactly how to do that.
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt">Here is one attempt:<br>
<br>
<br>
</span><span style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# service lustre3 stop</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Shutting down
lustre mounts
</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Lustre
modules successfully unloaded</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# lsmod | grep lnet</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# cat /etc/lnet.conf
</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">global:</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">
discovery: 0</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# service lustre3 start</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Mounting
/ephemeral... done.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Mounting
/nobackup... done.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# lnetctl export | grep -e Multi -e discover | sort
-u</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">
discovery: 1</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">
Multi-Rail: True</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]#</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">And a
similar attempt (same lnet.conf file), but trying to turn
off the discovery before doing the mounts:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# service lustre3 stop<o:p></o:p></pre>
<pre style="margin-left:.5in">Shutting down lustre mounts <o:p></o:p></pre>
<pre style="margin-left:.5in">Lustre modules successfully unloaded<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# modprobe lnet<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# lnetctl set discovery 0<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# service lustre3 start<o:p></o:p></pre>
<pre style="margin-left:.5in">Mounting /ephemeral... done.<o:p></o:p></pre>
<pre style="margin-left:.5in">Mounting /nobackup... done.<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# lnetctl export | grep -e Multi -e discover | sort -u<o:p></o:p></pre>
<pre style="margin-left:.5in"> discovery: 0<o:p></o:p></pre>
<pre style="margin-left:.5in"> Multi-Rail: True<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# <o:p></o:p></pre>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">If someone
can point me in the right direction, I'd appreciate it.
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Darby</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
<br>
<o:p></o:p></span></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>lustre-discuss mailing list<o:p></o:p></pre>
<pre><a href="mailto:lustre-discuss@lists.lustre.org" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><o:p></o:p></pre>
<pre><a href="https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=04%7C01%7Cdarby.vicker-1%40nasa.gov%7Cb2a81e07db45418e29df08d976fbbebd%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637671645889714604%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oRqJx%2FcdY29ppvvulneydVZZY%2Frm1vD8EddtDofafgk%3D&reserved=0" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><o:p></o:p></pre>
</blockquote>
</div>
</blockquote>
</body>
</html>