<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I would use configuration on /etc/lnet.conf and I would not use
anymore the older style configuration in <br>
</p>
<p><span style="font-size:11.0pt;font-family:Courier">/etc/modprobe.d/lustre.conf
<br>
</span></p>
<span style="font-size: 11pt;">for example in my /etc/lnet.conf
configuration I have:</span>
<p><b><span style="font-size:11.0pt;font-family:Courier">ip2nets:<br>
- net-spec: o2ib<br>
interfaces:<br>
0: ib0<br>
- net-spec: tcp<br>
interfaces:<br>
0: enp24s0f0<br>
global:<br>
discovery: 0</span></b></p>
<p><span style="font-size: 11pt;">As I disabled the auto discovery.</span></p>
<p><span style="font-size: 11pt;">Regarding ko2ib you can just use
/etc/modprobe.d/ko2iblnd.conf</span></p>
<p><span style="font-size: 11pt;">Mine looks like this:</span></p>
<p><span style="font-size: 11pt;"><b><font face="Courier New,
Courier, monospace">options ko2iblnd peer_credits=128
peer_credits_hiw=64 credits=1024 ntx=2048 map_on_demand=256
fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
conns_per_peer=4</font></b><br>
</span></p>
<p><span style="font-size: 11pt;">Hope it helps.</span></p>
<p><span style="font-size: 11pt;">Rick</span></p>
<p><span style="font-size:11.0pt;font-family:Courier"><br>
</span></p>
<div class="moz-cite-prefix">On 9/13/21 1:53 PM, Vicker, Darby J.
(JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss wrote:<br>
</div>
<blockquote type="cite"
cite="mid:F8B5C659-2F22-4C8A-B464-3D86EA5F1EEA@nasa.gov">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:Courier;
panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:"Times New Roman \(Body CS\)";
panose-1:2 11 6 4 2 2 2 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}.MsoChpDefault
{mso-style-type:export-only;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}div.WordSection1
{page:WordSection1;}</style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hello,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I would like
to know how to turn off auto discovery of peers on a
client. This seems like it should be straight forward but
we can't get it to work. Please fill me in on what I'm
missing.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We recently
upgraded our servers to 2.14. Our servers are multi-homed
(1 tcp network and 2 separate IB networks) but we want them
to be single rail. On one of our clusters we are still
using the 2.12.6 client and it uses one of the IB networks
for lustre. The modprobe file from one of the client nodes:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"># cat
/etc/modprobe.d/lustre.conf
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">options lnet
networks=o2ib1(ib0)<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">options
ko2iblnd map_on_demand=32<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The client
does have a route to the TCP network. This is intended to
allow jobs on the compute nodes to access licenese servers,
not for any serious I/O. We recently discovered that due to
some instability in the IB fabric, the client was trying to
fail over to tcp:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">#
dmesg | grep Lustre<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
250.205912] Lustre: Lustre: Build Version: 2.12.6<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
255.886086] Lustre: Mounted scratch-client<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
287.247547] Lustre:
3472:0:(client.c:2146:ptlrpc_expire_one_request()) @@@
Request sent has timed out for sent delay: [sent
1630699139/real 0] req@ffff98deb9358480
x1709911947878336/t0(0)
o9-><a class="moz-txt-link-abbreviated" href="mailto:hpfs-fsl-OST0001-osc-ffff9880cfb80000@192.52.98.33@tcp:28/4">hpfs-fsl-OST0001-osc-ffff9880cfb80000@192.52.98.33@tcp:28/4</a>
lens 224/224 e 0 to 1 dl 1630699145 ref 2 fl
Rpc:XN/0/ffffffff rc 0/-1<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
739.832744] Lustre:
3526:0:(client.c:2146:ptlrpc_expire_one_request()) @@@
Request sent has timed out for sent delay: [sent
1630699591/real 0] req@ffff98deb935da00
x1709911947883520/t0(0)
o400-><a class="moz-txt-link-abbreviated" href="mailto:scratch-MDT0000-mdc-ffff98b0f1fc0800@192.52.98.31@tcp:12/10">scratch-MDT0000-mdc-ffff98b0f1fc0800@192.52.98.31@tcp:12/10</a>
lens 224/224 e 0 to 1 dl 1630699598 ref 2 fl
Rpc:XN/0/ffffffff rc 0/-1<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
739.832755] Lustre:
3526:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 5
previous similar messages<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
739.832762] LustreError: 166-1: MGC10.150.100.30@o2ib1:
Connection to MGS (at 192.52.98.30@tcp) was lost; in
progress operations using this service will fail<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
739.832769] Lustre: hpfs-fsl-MDT0000-mdc-ffff9880cfb80000:
Connection to hpfs-fsl-MDT0000 (at 192.52.98.30@tcp) was
lost; in progress operations using this service will wait
for recovery to complete<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New"">[
1090.978619] LustreError: 167-0:
scratch-MDT0000-mdc-ffff98b0f1fc0800: This client was
evicted by scratch-MDT0000; in progress operations using
this service will fail.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I'm pretty
sure this is due to the auto discovery. Again, from a
client:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<pre style="margin-left:.5in"># lnetctl export | grep -e Multi -e discover | sort -u<o:p></o:p></pre>
<pre style="margin-left:.5in"> discovery: 0<o:p></o:p></pre>
<pre style="margin-left:.5in"> Multi-Rail: True<o:p></o:p></pre>
<pre style="margin-left:.5in"># <o:p></o:p></pre>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We want to
restrict lustre to only the IB NID but its not clear exactly
how to do that.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt">Here is one attempt:<br>
<br>
<br>
</span><span style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# service lustre3 stop<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Shutting down
lustre mounts
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Lustre modules
successfully unloaded<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# lsmod | grep lnet<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# cat /etc/lnet.conf
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">global:<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"> discovery:
0<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# service lustre3 start<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Mounting
/ephemeral... done.<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">Mounting
/nobackup... done.<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]# lnetctl export | grep -e Multi -e discover | sort -u<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier"> discovery:
1<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">
Multi-Rail: True<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.0pt;font-family:Courier">[root@r1i1n18
lnet]#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">And a
similar attempt (same lnet.conf file), but trying to turn
off the discovery before doing the mounts:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# service lustre3 stop<o:p></o:p></pre>
<pre style="margin-left:.5in">Shutting down lustre mounts <o:p></o:p></pre>
<pre style="margin-left:.5in">Lustre modules successfully unloaded<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# modprobe lnet<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# lnetctl set discovery 0<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# service lustre3 start<o:p></o:p></pre>
<pre style="margin-left:.5in">Mounting /ephemeral... done.<o:p></o:p></pre>
<pre style="margin-left:.5in">Mounting /nobackup... done.<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# lnetctl export | grep -e Multi -e discover | sort -u<o:p></o:p></pre>
<pre style="margin-left:.5in"> discovery: 0<o:p></o:p></pre>
<pre style="margin-left:.5in"> Multi-Rail: True<o:p></o:p></pre>
<pre style="margin-left:.5in">[root@r1i1n18 lnet]# <o:p></o:p></pre>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">If someone
can point me in the right direction, I'd appreciate it.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Darby<o:p></o:p></span></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
lustre-discuss mailing list
<a class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
<a class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
</pre>
</blockquote>
</body>
</html>