[lustre-discuss] Appropriate Umount Ordering

Thu Feb 17 08:07:56 PST 2022

Hi all,

(Hopefully) simple two questions this time around.  This is for 2.14.0, and my cluster is setup with no failovers for MDTs or OSTs.  OBD timeouts have not been altered from the defaults.

Question 1:

I read on the Lustre Wiki that the appropriate ordering to umount the various components of a Lustre filesystem is:
1. Clients
2. MDT(s)
3. OSTs
4. MGS

However, if I do it this way, the OST mounts always hang for 04:25 seconds before umounting.  Dmesg reports:
[88944.272233] Lustre: 30178:0:(client.c:2282:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1645111309/real 1645111309]  req at 00000000cc9c1aeb x1724931853622016/t0(0) o39->lustrefs-MDT0000-lwp-OST0000 at 10.1.98.8@tcp:12/10 lens 224/224 e 0 to 1 dl 1645111574 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:''
[88944.275884] Lustre: Failing over lustrefs-OST0000
[88944.429622] Lustre: server umount lustrefs-OST0000 complete

For reference, if I reverse OSTs and MDT (do the MDT second), then all of the OST umounts are fast, but the MDT takes a whopping 8 minutes and 50 seconds to umount.

Why is the canonical shutdown ordering delaying so long (and so specifically) for me?

Question 2:

In all cases (OSTs or MDTs) of umount, whether they are fast or not, I see messages like the following in dmesg:
[88944.275884] Lustre: Failing over lustrefs-OST0000
or
[78406.007678] Lustre: Failing over lustrefs-MDT0000

There is no failover configured in my setup.  The MGS is up the entire time in all cases.  What is lustre doing here?  How do I explicitly disable this failover attempt, since it seems to be at best misleading and at worst directly related to the lengthy delays?  FWIW, I have tried umount with '-f' to cause the MDT to go into failout rather than failover to no avail.

Thanks for any help folks can offer on this in advance,

ellis