[lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

Andreas Dilger adilger at whamcloud.com
Thu Sep 28 02:17:53 PDT 2023


On Sep 26, 2023, at 13:44, Audet, Martin via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

Hello all,

I would appreciate if the community would give more attention to this issue because upgrading from 2.12.x to 2.15.x, two LTS versions, is something that we can expect many cluster admin will try to do in the next few months...

Who, in particular, is "the community"?

That term implies a collective effort, and I'd welcome feedback from your testing of the upgrade process.  It is definitely possible for an individual to install Lustre 2.12.9 on one or more VMs (or better, use a clone of your current server OS image), format a small test filesystem with the current configuration, copy some data into it, and then follow your planned process to upgrade to 2.15.3 (which should mostly just be "unmount everything, install new RPMs, mount").  That is prudent system administration to test the process in advance of changing your production system.

We ourselves plan to upgrade a small Lustre (production) system from 2.12.9 to 2.15.3 in the next couple of weeks...

After seeing problems reports like this we start feeling a bit nervous...

The documentation for doing this major update appears to me as not very specific...

Patches with improvements to the process described in the manual are welcome.  Please see https://wiki.lustre.org/Lustre_Manual_Changes for details on how to submit your contributions.

In this document for example, https://doc.lustre.org/lustre_manual.xhtml#upgradinglustre , the update process appears not so difficult and there is no mention of using "tunefs.lustre --writeconf" for this kind of update.

Or am I missing something ?

I think you are answering your own question here...  The documented upgrade process has no mention of running "writeconf", but it was run for an unknown reason. This introduced an unknown problem with the configuration files that prevented the target from mounting.

Then, rather than re-running writeconf to fix the configuration files, the entire MDT was copied to a new storage device (a large no-op IMHO, since any issue with the MDT config files would be copied along with it) and writeconf was run again to regenerate the configs, which could have been done just as easily on the original MDT.

So the relatively straight forward upgrade process was turned into a complicated process for no apparent reason.

There have been 2.12->2.15 upgrades done already in production without issues, and this is also tested continuously during development.  Of course there are a wide variety of different configurations, features, and hardware on which Lustre is run, and it isn't possible to test even a fraction of all configurations.  I don't think one problem report on the mailing list is an indication that there are fundamental issues with the upgrade process.

Cheers, Andreas

Thanks in advance for providing more tips for this kind of update.

Martin Audet
________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org>> on behalf of Tung-Han Hsieh via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>
Sent: September 23, 2023 2:20 PM
To: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.
Dear All,

Today we tried to upgrade Lustre file system from version 2.12.6 to 2.15.3. But after the work, we cannot mount MDT successfully. Our MDT is ldiskfs backend. The procedure of upgrade is

1. Install the new version of e2fsprogs-1.47.0
2. Install Lustre-2.15.3
3. After reboot, run: tunefs.lustre --writeconf /dev/md0

Then when mounting MDT, we got the error message in dmesg:

===========================================================
[11662.434724] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[11662.584593] Lustre: 3440:0:(scrub.c:189:scrub_file_load()) chome-MDT0000: reset scrub OI count for format change (LU-16655)
[11666.036253] Lustre: MGS: Logs for fs chome were removed by user request.  All servers must be restarted in order to regenerate the logs: rc = 0
[11666.523144] Lustre: chome-MDT0000: Imperative Recovery not enabled, recovery window 300-900
[11666.594098] LustreError: 3440:0:(mdd_device.c:1355:mdd_prepare()) chome-MDD0000: get default LMV of root failed: rc = -2
[11666.594291] LustreError: 3440:0:(obd_mount_server.c:2027:server_fill_super()) Unable to start targets: -2
[11666.594951] Lustre: Failing over chome-MDT0000
[11672.868438] Lustre: 3440:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695492248/real 1695492248]  req at 000000005dfd9b53 x1777852464760768/t0(0) o251->MGC192.168.32.240 at o2ib@0 at lo:26/25 lens 224/224 e 0 to 1 dl 1695492254 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:''
[11672.925905] Lustre: server umount chome-MDT0000 complete
[11672.926036] LustreError: 3440:0:(super25.c:183:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -2
[11872.893970] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
============================================================

Could anyone help to solve this problem ? Sorry that it is really urgent.

Thank you very much.

T.H.Hsieh
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230928/c4104b3d/attachment-0001.htm>


More information about the lustre-discuss mailing list