[lustre-discuss] Cannot mount from Lustre from client any longer

Miguel Santos Novoa miguelsn at met.no
Thu Jun 27 05:16:13 PDT 2019


Hey guys,

For the last couple of weeks we have been adding and removing OSTs, and we
were also doing tests with a client using Lustre version 2.12, which this
seems our main hypothesis of the problem. We are not sure what is causing
this behavior.

>From all our clients, we cannot mount lustre any longer, although the
active mounts are still serving and no other element seems to be affected.
Because of the nature and importance we have not and we don't want to give
it a try to reboot the MDS/MDT server.

There is no firewall between client and server and I can ping using lctl
ping <mds server>.

>From client, we execute the following *command:*
# mount -t lustre mds-b1.met.no at tcp:mds-b2.met.no at tcp:/WATZMANN/storeB
/lustre/storeB  -o rw,localflock,lazystatfs --verbose

We get the following error in the *standard output:*
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw,localflock,lazystatfs
arg[4] = mds-b1.met.no at tcp:mds-b2.met.no at tcp:/WATZMANN/storeB
arg[5] = /lustre/storeB
source = mds-b1.met.no at tcp:mds-b2.met.no at tcp:/WATZMANN/storeB
(157.249.162.240 at tcp:157.249.162.221 at tcp:/WATZMANN/storeB), target =
/lustre/storeB
options = rw,localflock,lazystatfs
mounting device 157.249.162.240 at tcp:157.249.162.221 at tcp:/WATZMANN/storeB at
/lustre/storeB, flags=0x1000000
options=localflock,lazystatfs,device=157.249.162.240 at tcp:157.249.162.221 at tcp
:/WATZMANN/storeB
mount.lustre: mount mds-b1.met.no at tcp:mds-b2.met.no at tcp:/WATZMANN/storeB at
/lustre/storeB failed: Function not implemented retries left: 0
mount.lustre: mount mds-b1.met.no at tcp:mds-b2.met.no at tcp:/WATZMANN/storeB at
/lustre/storeB failed: Function not implemented

*Syslog in the client* caught this:
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.494299] Lustre:
WATZMANN: root_squash is set to 44052:44052
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.537552] Lustre:
WATZMANN: nosquash_nids set to 157.249.160.140 at tcp 157.249.162.115 at tcp
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.537594] Lustre:
setting import WATZMANN-OST0061_UUID INACTIVE by administrator request
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.537597] Lustre:
Skipped 30 previous similar messages
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.566262]
LustreError: 5282:0:(obd_config.c:1682:class_config_llog_handler())
MGC157.249.162.240 at tcp: cfg command failed: rc = -38
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.579682] Lustre:
 cmd=cf00f 0:WATZMANN-MDT0000-mdc  1:osc.active=0
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.579682]
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.579788]
LustreError: 15c-8: MGC157.249.162.240 at tcp: The configuration from log
'WATZMANN-client' failed (-38). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other errors.
See the syslog for more information.
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.606605]
LustreError: 5280:0:(lov_obd.c:878:lov_cleanup())
WATZMANN-clilov-ffff881015e7b800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.620657]
LustreError: 5280:0:(lov_obd.c:878:lov_cleanup()) Skipped 150 previous
similar messages
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.660767] Lustre:
Unmounted WATZMANN-client
Jun 27 11:23:31 r720xd-85z3zz1-ar-compute kernel: [  332.662894]
LustreError: 5280:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount
 (-38)

*Syslog in the server* only shows this:
Jun 27 11:37:31 mds-b1 journal: Suppressed 1450 messages from /
Jun 27 11:37:53 mds-b1 kernel: Lustre: MGS: Connection restored to
3ac41542-d423-fff4-1153-38101587954b (at 157.249.160.114 at tcp)
Jun 27 11:37:53 mds-b1 kernel: Lustre: Skipped 2 previous similar messages
Jun 27 11:38:02 mds-b1 journal: Suppressed 939 messages from /


I am attaching the traces from the client in case someone can see
something. Traces here:
https://drive.google.com/file/d/1kMm3DDngLsWoAJ4THIx0QuBT2BpIOkMw/view

Many thanks!!!

Server:
Centos 7.6
Lustre 2.10.7
Kernel 3.10.0-957.1.3.el7_lustre

Client (one of many):
Ubuntu 16.04.6
Lustre 2.10.7 & 2.10.6 (tried both)
Kernel 4.4.0-142-generic / 4.4.0-131-generic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190627/916f1c93/attachment-0001.html>


More information about the lustre-discuss mailing list