[lustre-discuss] 1.8 client on 3.13.0 kernel

Thu Sep 10 06:37:41 PDT 2015

Thanks very much for this. Will let you know how we come out once we absorb 
this and get the courage to pull the trigger.

-lewis

On 9/10/15 9:30 AM, Martin Hecht wrote:
> Hi Lewis,
>
> it's difficult to tell how much data loss was actually related to the
> lustre upgrade itself. We have upgraded 6 file systems and we had to do
> it more or less in one shot, because at that time they were using a
> common MGS server. All servers of one file system must be on the same
> level (at least for the major upgrade 1.8 to 2.x, there is rolling
> upgrade for minor versions in the lustre 2 branch now, but I have no
> experience with that).
>
> In any case the file systems should be clean before starting the
> upgrade, so I would recommend to run e2fsck on all targets and repair
> them before starting the upgrade. We did so, but unfortunately our
> e2fsprogs were not really up to date and after our lustre upgrade a lot
> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
> probably some errors on the file systems were still present, but
> unnoticed when we did the upgrade.
>
> Lustre 2 introduces the FID (which is something like an inode number,
> where lustre 1.8 used the inode number of the underlying ldiskfs, but
> with the possibility to have several MDTs in one file system a
> replacement was needed). The FID is stored in the inode, but it can also
> be activated that the FIDs are stored in the directory node, which makes
> lookups faster, especially when there are many files in a directory.
> However, there were bugs in the code that takes care about adding the
> FID to the directory entry when the file system is converted from 1.8 to
> 2.x. So, I would recommend to use a version in which these bug are
> solved. We went to 2.4.1 that time. By default this fid_in_dirent
> feature is not automatically enabled, however, this is the only point
> where a performance boost may be expected... so we took the risk to
> enable this... and ran into some bugs.
>
> We had other file systems, still on 1.8, so with the server upgrade we
> didn't upgrade the clients, because lustre 2 clients wouldn't have been
> able to mount the 1.8 file systems. And we use quotas, and for this you
> need the 1.8.9 client with a patch that corrects a defect of the 1.8.9
> client when it talks to 2.x servers (LU-3067). However, older 1.8
> clients don't support the Lustre 2 quota (which came in 2.2 or 2.4, I'm
> not 100% sure). BTW, it still runs out of sync from time to time, but
> the limit seems to be fine now, it's just the numbers the users see. lfs
> quota prints out too low numbers and users run out of quota earlier than
> they expect... It's better in the latest 2.5 versions now.
>
> Here an unsorted(!) list of bugs we have hit during the lustre upgrade.
> For most of them we weren't the first ones, but I guess you could wait
> forever for the version in which all bugs are resolved :-)
>
> LU-3067 - already mentioned above, a patch for 1.8.9 clients
> interoperating with 2.x servers, however, 1.8.9 is needed for having
> quota working. Without this patch clients become unresponsive, 100% cpu
> load, then just hang and devices become unavailable, reboot doesn't
> work, so power cycle needed, but after a while the problem reappeared
>
> LU-4504 - e2fsck noticed quota issues similar to this bug on osts - use
> latest e2fsprogs, check again and then the ldiskfs backend doesn't run
> into this anymore
>
> e2fsck noticed quota issues on MDT "Problem in HTREE directory inode
> 21685465: block #16 not referenced"  however, could be fixed by e2fsck
>
> LU-5626 mdt becomes readonly: one file system where the MDT was
> corrupted at earlier stage and obviously not fully repaired lbuged upon
> MDT mount, could only be mounted with noscrub option
>
> the mdt group_upcall (which can be configured with tunefs) used to be
> /usr/sbin/l_getgroups in lustre 1.8 and it was set by default - the
> program is called l_getidentity now, is not configured by default
> anymore. You should either change it with tunefs, or put an appropriate
> link in place as a fallback. Anyhow, lustre 2 file systems don't use it
> by default anymore. They just trust the client. It also means that
> users/groups are not needed anymore on lustre the servers. (we had lokal
> passwd/group files there so that secondary groups work properly,
> alternatively you could configure ldap, but without group_upcall, all
> this is handled by the lustre client.
>
> LU-5626 and LU-2627: .. directory entries were damaged by adding the
> FID, once all old directories were converted and all files somehow
> recovered (in several consecutive attempts), the problem is gone. The
> number of emergency maintenances is basically limited by the depth of
> your directory structure. It could be repaired by running e2fsck,
> followed by manually moving back everything (save the log of the e2fsck
> which tells you the relation of the objects in lost+found and their
> original path!)
>
> LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again
> - I believe that's something which must be done anyhow quite often,
> because there is no quotacheck anymore. It's run in the background when
> enabling quotas, but file systems have to be unmounted for this.
>
> Related to quota, there is a change in the lfs setquota command. The
> manual sais that soft limits must be  < hard limits, but you have to
> specify them. You could put a zero, but in later versions it must be
> present on the command line. In 1.8 lfs setquota was more relaxed, but
> it simply didn't initialize some values properly. This change caused our
> quota management to fail. However, after fixing the call it worked fine
> again.
>
> LU-3861 quota severely broken: It was not possible to move files for
> some users/groups while it worked for others. Copying on the other hand
> seemed to work. Maybe this was in combination with one of the first
> attempts to fix the fid issue. However, neither e2fsck, nor tune2fs
> could fix the problem. We had to upgrade to e2fsprogs 1.42.7 which then
> contained some improvements which made e2fsk able to fix this and
> allowed ldiskfs running more stable afterwards.
>
> LU-3917: During the upgrade we needed to re-create the PENDING direktory
> on the ldiskfs level on one of our file systems
>
> LU-4743: We had to remove the CATALOGS file on another file system
> (otherwise the MDT wouldn't mount)
>
> And if you upgrade to 2.5, there was a bug which caused the MDS to crash
> when large_xattr (for wide striping) is not set and a user tries to use
> it anyway. But probably you don't have that many OSTs because the number
> was limited anyway in 1.8.
>
> A couple of other problems were related to the software which our
> supplier uses to manage the lustre servers, but that's not a lustre
> issue, it's just how a large number of servers is booted, maintained and
> configured. Anyhow, fighting these problems on top didn't make things
> easier ;-)
>
> That was a very much shortened list of our upgrade trouble (shortened
> not in the number of issues, but leaving out the log messages,
> discussions, attempts to repair things...). Later, we also have
> configured a separate MGS for each file system, upgraded once more to
> 2.5, reconfigured the lnet configuration - that was all much less
> trouble than the upgrade from 1.8 to 2.4.1 - maybe looking back that was
> a bad version but at some point you have to decide for a target version
> - and maybe I would do exactly the same step again, now with the
> knowledge what can happen and on which things I must keep an eye. I
> wouldn't enable the fid_in_dirent feature and I would for sure update
> e2fsprogs as a first step.
>
> best regards,
> Martin
>
> On 09/09/2015 03:16 PM, Lewis Hyatt wrote:
>> OK thanks for sharing your experience. Unfortunately I can't see a way
>> for us to get duplicate hardware, so we will have to give it a shot;
>> we were going to try the artificial test first as well. If you don't
>> mind taking another minute, I'd be curious what was the nature of the
>> problems you ran into... was it potential data loss, or just issues
>> getting it to perform the upgrade? Thanks again.
>>
>> -lewis
>
>