[lustre-discuss] Unable to mount ZFS based snapshots

Mon Jul 20 05:33:37 PDT 2020

Dear Lustre Community,

since some time, we are not able to mount ZFS based snapshots for one
filesystem anymore. Some details about the setup:

- OS: Centos 7, 3.10.0-1127.8.2.el7_lustre.x86_64
- Lustre: 2.12.5
- Servers: 2x MDT, 11xOST
- Network: only TCP
- MDTs share machines with OSTs.

We have two separate Lustre filesystems (meteo0 and meteo1), both
running on the same hardware. One works perfectly (meteo1), including
mounting snapshots, for the other one (meteo0), we are not able to mount
snapshots anymore. The only difference is, that we registered a
changelog user on the one that is not able to mount snapshots anymore.
Removing the changelog user did not change the situation.

Mounting a snapshot on a client just hangs forever without anything
meaningful in the system log. Once the mount process is canceled by
crtl-c, the following shows up in the logs:

kernel: LustreError: 21966:0:(lmv_obd.c:1415:lmv_statfs())
7a21a072-MDT0000-mdc-ffff9988697c8800: can't stat MDS #0: rc = -11
kernel: LustreError: 21966:0:(lov_obd.c:839:lov_cleanup())
7a21a072-clilov-ffff9988697c8800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
kernel: LustreError: 21966:0:(lov_obd.c:839:lov_cleanup()) Skipped 10
previous similar messages
kernel: Lustre: Unmounted 7a21a072-client
kernel: LustreError: 21966:0:(obd_mount.c:1608:lustre_fill_super())
Unable to mount  (-11)

On the server side, mounting look just normal:

kernel: Lustre: 7a21a072-MDT0000: Imperative Recovery enabled, recovery
window shrunk from 300-900 down to 150-900
kernel: Lustre: 7a21a072-MDT0000: nosquash_nids set to
10.153.52.[28-41]@tcp ...
kernel: Lustre: 7a21a072-MDT0000: root_squash is set to 65534:65534
kernel: Lustre: 7a21a072-MDT0000: set dev_rdonly on this device

A network problem seems unlikely, because the actual filesystem, of
which the snapshots are made, works normal. Clients can mount it without
any problems.

I tried to look at debug messages on the client to verify that there is
no connection problem:

lctl set_param debug=+rpctrace; lctl set_param debug=+net; lctl clear
lctl mark "debug start"

mount -t lustre -o ro met-sv-lustre at tcp:/7a21a072 /mnt

lctl mark "debug finish"
lctl set_param debug=-rpctrace; lctl set_param debug=-net
lctl dk > /tmp/log

The connection to the MGS works, the connection to the second MDT works
as well:

00000100:00080000:8.0:1595229534.555839:0:6848:0:(import.c:86:import_set_state_nolock())
00000000e70957a2 MGS: changing import state from CONNECTING to FULL
00000100:00080000:0.0:1595229534.934793:0:6848:0:(import.c:86:import_set_state_nolock())
00000000ba745de3 7a21a072-MDT0001_UUID: changing import state from
CONNECTING to FULL

But the connection to the first MDT fails, which is confusing, because
the ZFS snapshot dataset is mounted on the server that is shown in the
log message:

00000100:00100000:0.0:1595229534.934627:0:6848:0:(client.c:2719:ptlrpc_free_committed())
7a21a072-MDT0000-mdc-ffff9981d3d1d800: committing for last_committed 0 gen 1
00000100:00080000:0.0:1595229534.934632:0:6848:0:(import.c:86:import_set_state_nolock())
00000000d6a94009 7a21a072-MDT0000_UUID: changing import state from
CONNECTING to DISCONN
00000100:00080000:0.0:1595229534.934635:0:6848:0:(import.c:1382:ptlrpc_connect_interpret())
recovery of 7a21a072-MDT0000_UUID on 10.153.52.30 at tcp failed (-11)

If I try to mount the actual filesystem, which of course runs on the
very some servers, then all connections are successful.

On the server, all ZFS snapshot datasets are mounted, no errors are
shown. Every MDT dataset and every OST dataset has a snapshot any they
are all mounted.

Looking on the servers into debug messages shows some connection problems:

00000100:00080000:19.0:1595238913.879579:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e66a1357800 7a21a072-MDT0001_UUID: changing import state from
CONNECTING to DISCONN
00000100:00080000:19.0:1595238913.881596:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e46fc751000 7a21a072-OST0004_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.881734:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e46aa7cf000 7a21a072-OST0005_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.881946:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e5f4e923000 7a21a072-OST0006_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.882106:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e5f4e927800 7a21a072-OST0007_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.882444:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e5692d33000 7a21a072-OST0008_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.882587:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e5692d35800 7a21a072-OST0009_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.883326:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e36524e4800 7a21a072-OST000a_UUID: changing import state from
CONNECTING to FULL
00000100:00080000:19.0:1595238913.889291:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e36524e6800 7a21a072-MDT0000_UUID: changing import state from
CONNECTING to DISCONN
00000100:00080000:19.0:1595238918.888697:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e5692d34000 7a21a072-OST0003_UUID: changing import state from
CONNECTING to DISCONN
00000100:00080000:19.0:1595238918.888703:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e66a1352800 7a21a072-OST0002_UUID: changing import state from
CONNECTING to DISCONN
00000100:00080000:19.0:1595238918.888707:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e46aa7c9000 7a21a072-OST0001_UUID: changing import state from
CONNECTING to DISCONN
00000100:00080000:19.0:1595238918.888709:0:9139:0:(import.c:86:import_set_state_nolock())
ffff8e5fb3f2f000 7a21a072-OST0000_UUID: changing import state from
CONNECTING to DISCONN

Both MDTs and OST[0-3] are running on one HA server pair, the other OSTs
that connect successfully are running on different server pairs.

Did anyone experience comparable problems? Any suggestions that we could
try next?

We tried already:
- Removing the changelog user with lctl changelog_deregister. This did
not change anything. But I'm not sure if it really removed all changelog
related information. If you register a new user, the the number of the
user is incremented. After removing cl1, adding a new user results in
cl2, not again in cl1. That means not all traces of cl1 have been
removed. The size of the changelog, as seen by 'lctl get_param
"*.*.changelog_size"' is also larger than zero after removing the
changelog user. Any ideas how to completely remove the changelog users?
- Running lfsck. Some things are corrected, but apparently nothing
related to the snapshot problem.
- Rewriting the configuration using tunefs.lustre --writeconf. Also no
effect.

Because the Filesystem meteo1, which shares all servers with meteo0
works perfectly and the only difference I know about is the changelog
user, I would guess, that the changelog user plays a role. But that is
only speculation.

I would be very happy to get some useful advice from you!

Robert

-- 

Dr. Robert Redl
Scientific Programmer, "Waves to Weather" (SFB/TRR165)
Meteorologisches Institut
Ludwig-Maximilians-Universität München
Theresienstr. 37, 80333 München, Germany

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200720/3322a426/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200720/3322a426/attachment-0001.sig>