[lustre-discuss] 1.8 client on 3.13.0 kernel
lhyatt at gmail.com
Thu Sep 10 12:11:12 PDT 2015
Thanks a lot for the info, a little more optimistic :-).
On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the most part things went pretty good. I’ll chime in on a couple of Martin’s points and mention a few other things.
>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <hecht at hlrs.de> wrote:
>> In any case the file systems should be clean before starting the
>> upgrade, so I would recommend to run e2fsck on all targets and repair
>> them before starting the upgrade. We did so, but unfortunately our
>> e2fsprogs were not really up to date and after our lustre upgrade a lot
>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
>> probably some errors on the file systems were still present, but
>> unnoticed when we did the upgrade.
> This is a very important point. While I didn’t run e2fsck before the upgrade (but maybe I should have), I made sure to install the latest e2fsprogs.
>> Lustre 2 introduces the FID (which is something like an inode number,
>> where lustre 1.8 used the inode number of the underlying ldiskfs, but
>> with the possibility to have several MDTs in one file system a
>> replacement was needed). The FID is stored in the inode, but it can also
>> be activated that the FIDs are stored in the directory node, which makes
>> lookups faster, especially when there are many files in a directory.
>> However, there were bugs in the code that takes care about adding the
>> FID to the directory entry when the file system is converted from 1.8 to
>> 2.x. So, I would recommend to use a version in which these bug are
>> solved. We went to 2.4.1 that time. By default this fid_in_dirent
>> feature is not automatically enabled, however, this is the only point
>> where a performance boost may be expected... so we took the risk to
>> enable this... and ran into some bugs.
> Enabling fid_in_dirent prevents you from backing out of the upgrade. In theory, if you upgraded to Lustre 2.x without enabling fid_in_dirent, you could always revert back to Lustre 1.8. We tried this on a test system, and the downgrade seemed to work. However, this was a small scale test and I have never tried it on a production file system. But if you want to minimize possible complications, you could always leave this disabled for a while after the updgrade, and then if things are going well, enable it later on.
>> LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again
>> - I believe that's something which must be done anyhow quite often,
>> because there is no quotacheck anymore. It's run in the background when
>> enabling quotas, but file systems have to be unmounted for this.
> We didn’t exactly hit this bug, but I will mention that we have had a couple of instance where e2fsck complained about problems on an OST, and it turned out that we had to disable and re-enable quotas on the OST to correct the issue.
>> LU-4743: We had to remove the CATALOGS file on another file system
>> (otherwise the MDT wouldn't mount)
> We hit this problem.
> Someone I know had to do a Lustre upgrade, and they suggested that I apply a patch for LU-4708 (which I did). But if you upgrade to Lustre 2.5.2 or later, that patch should already be included.
> My only other advice is to test as much as possible prior to the upgrade. If you have a little test hardware, install the same Lustre 1.8 version you are currently running in production and then try upgrading that to the new Lustre version. I think preparation is the key. I think I spent about 2 months reading about upgrade procedures, talking with others who have upgraded, reading JIRA bug reports, and running tests on hardware.
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
More information about the lustre-discuss