[lustre-discuss] 1.8 client on 3.13.0 kernel

Patrick Farrell paf at cray.com
Fri Sep 11 07:14:22 PDT 2015

Having an MDT backup might perhaps have allowed recovery and trying an improved upgrade process and/or upgrading to a version with the fixes in it.  It's not a bad idea if practical.  (And yes, the changes are MDT specific.)

By the way, the fid-in-dirent bug that Martin described is fixed in the most recent 2.5 from Intel, but I don't think it's fixed in 2.4?  Unsure.
But I'd recommend targeting 2.5 as the destination version for an upgrade.
From: lustre-discuss [lustre-discuss-bounces at lists.lustre.org] on behalf of Chris Hunter [chris.hunter at yale.edu]
Sent: Friday, September 11, 2015 8:02 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

I believe FID & dirdata feature changes would only affect the MDT during
a lustre upgrade. In hindsight/retrospective do you think a file-level
backup/restore of the MDT would have avoided some of these issues ?

chris hunter

> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> Lewis,
>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the most part things went pretty good.  I?ll chime in on a couple of Martin?s points and mention a few other things.
>>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <hecht at hlrs.de> wrote:
>>> In any case the file systems should be clean before starting the
>>> upgrade, so I would recommend to run e2fsck on all targets and repair
>>> them before starting the upgrade. We did so, but unfortunately our
>>> e2fsprogs were not really up to date and after our lustre upgrade a lot
>>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
>>> probably some errors on the file systems were still present, but
>>> unnoticed when we did the upgrade.
>> This is a very important point.  While I didn?t run e2fsck before the upgrade (but maybe I should have), I made sure to install the latest e2fsprogs.
>>> Lustre 2 introduces the FID (which is something like an inode number,
>>> where lustre 1.8 used the inode number of the underlying ldiskfs, but
>>> with the possibility to have several MDTs in one file system a
>>> replacement was needed). The FID is stored in the inode, but it can also
>>> be activated that the FIDs are stored in the directory node, which makes
>>> lookups faster, especially when there are many files in a directory.
>>> However, there were bugs in the code that takes care about adding the
>>> FID to the directory entry when the file system is converted from 1.8 to
>>> 2.x. So, I would recommend to use a version in which these bug are
>>> solved. We went to 2.4.1 that time. By default this fid_in_dirent
>>> feature is not automatically enabled, however, this is the only point
>>> where a performance boost may be expected... so we took the risk to
>>> enable this... and ran into some bugs.
>> Enabling fid_in_dirent prevents you from backing out of the upgrade.  In theory, if you upgraded to Lustre 2.x without enabling fid_in_dirent, you could always revert back to Lustre 1.8.  We tried this on a test system, and the downgrade seemed to work.  However, this was a small scale test and I have never tried it on a production file system.  But if you want to minimize possible complications, you could always leave this disabled for a while after the updgrade, and then if things are going well, enable it later on.
>>> LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again
>>> - I believe that's something which must be done anyhow quite often,
>>> because there is no quotacheck anymore. It's run in the background when
>>> enabling quotas, but file systems have to be unmounted for this.
>> We didn?t exactly hit this bug, but I will mention that we have had a couple of instance where e2fsck complained about problems on an OST, and it turned out that we had to disable and re-enable quotas on the OST to correct the issue.
>>> LU-4743: We had to remove the CATALOGS file on another file system
>>> (otherwise the MDT wouldn't mount)
>> We hit this problem.
>> Someone I know had to do a Lustre upgrade, and they suggested that I apply a patch for LU-4708 (which I did).  But if you upgrade to Lustre 2.5.2 or later, that patch should already be included.
>> My only other advice is to test as much as possible prior to the upgrade.  If you have a little test hardware, install the same Lustre 1.8 version you are currently running in production and then try upgrading that to the new Lustre version.  I think preparation is the key.  I think I spent about 2 months reading about upgrade procedures, talking with others who have upgraded, reading JIRA bug reports, and running tests on hardware.
lustre-discuss mailing list
lustre-discuss at lists.lustre.org

More information about the lustre-discuss mailing list