[lustre-discuss] Disabling multi-rail dynamic discovery

Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] darby.vicker-1 at nasa.gov
Mon Sep 13 13:53:24 PDT 2021


Hello,

I would like to know how to turn off auto discovery of peers on a client.  This seems like it should be straight forward but we can't get it to work. Please fill me in on what I'm missing.

We recently upgraded our servers to 2.14.  Our servers are multi-homed (1 tcp network and 2 separate IB networks) but we want them to be single rail.  On one of our clusters we are still using the 2.12.6 client and it uses one of the IB networks for lustre.  The modprobe file from one of the client nodes:


# cat /etc/modprobe.d/lustre.conf
options lnet networks=o2ib1(ib0)
options ko2iblnd map_on_demand=32
#


The client does have a route to the TCP network.  This is intended to allow jobs on the compute nodes to access licenese servers, not for any serious I/O.  We recently discovered that due to some instability in the IB fabric, the client was trying to fail over to tcp:


# dmesg | grep Lustre
[  250.205912] Lustre: Lustre: Build Version: 2.12.6
[  255.886086] Lustre: Mounted scratch-client
[  287.247547] Lustre: 3472:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1630699139/real 0]  req at ffff98deb9358480 x1709911947878336/t0(0) o9->hpfs-fsl-OST0001-osc-ffff9880cfb80000 at 192.52.98.33@tcp:28/4 lens 224/224 e 0 to 1 dl 1630699145 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[  739.832744] Lustre: 3526:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1630699591/real 0]  req at ffff98deb935da00 x1709911947883520/t0(0) o400->scratch-MDT0000-mdc-ffff98b0f1fc0800 at 192.52.98.31@tcp:12/10 lens 224/224 e 0 to 1 dl 1630699598 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[  739.832755] Lustre: 3526:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
[  739.832762] LustreError: 166-1: MGC10.150.100.30 at o2ib1: Connection to MGS (at 192.52.98.30 at tcp) was lost; in progress operations using this service will fail
[  739.832769] Lustre: hpfs-fsl-MDT0000-mdc-ffff9880cfb80000: Connection to hpfs-fsl-MDT0000 (at 192.52.98.30 at tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 1090.978619] LustreError: 167-0: scratch-MDT0000-mdc-ffff98b0f1fc0800: This client was evicted by scratch-MDT0000; in progress operations using this service will fail.


I'm pretty sure this is due to the auto discovery.  Again, from a client:



# lnetctl export | grep -e Multi -e discover | sort -u

    discovery: 0

      Multi-Rail: True

#


We want to restrict lustre to only the IB NID but its not clear exactly how to do that.

Here is one attempt:


[root at r1i1n18 lnet]# service lustre3 stop
Shutting down lustre mounts
Lustre modules successfully unloaded
[root at r1i1n18 lnet]# lsmod | grep lnet
[root at r1i1n18 lnet]# cat /etc/lnet.conf
global:
    discovery: 0
[root at r1i1n18 lnet]# service lustre3 start
Mounting /ephemeral... done.
Mounting /nobackup... done.
[root at r1i1n18 lnet]# lnetctl export | grep -e Multi -e discover | sort -u
    discovery: 1
      Multi-Rail: True
[root at r1i1n18 lnet]#


And a similar attempt (same lnet.conf file), but trying to turn off the discovery before doing the mounts:



[root at r1i1n18 lnet]# service lustre3 stop

Shutting down lustre mounts

Lustre modules successfully unloaded

[root at r1i1n18 lnet]# modprobe lnet

[root at r1i1n18 lnet]# lnetctl set discovery 0

[root at r1i1n18 lnet]# service lustre3 start

Mounting /ephemeral... done.

Mounting /nobackup... done.

[root at r1i1n18 lnet]# lnetctl export | grep -e Multi -e discover | sort -u

    discovery: 0

      Multi-Rail: True

[root at r1i1n18 lnet]#

If someone can point me in the right direction, I'd appreciate it.

Thanks,
Darby
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210913/95d9e282/attachment.html>


More information about the lustre-discuss mailing list