[lustre-discuss] problem after upgrading 2.10.4 to 2.12.4

Wed Jun 24 11:43:56 PDT 2020

I would not plan a direct upgrade until Whamcloud fixes the underlying issue. Currently the only viable way seem to be a step by step upgrade. I imagine you'd first upgrade to 2.10.8, and then copy all old file to a new place (something like: mkdir .new_copy; rsync -a  * .new_copy; rm -rf *; mv .new_copy/* .; rmdir .new_copy) so that all files have been re-created with correct information. Knut's script is a hack and last minute resort. 

-----Original Message-----
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> On Behalf Of Patrick Shopbell
Sent: Wednesday, June 24, 2020 12:36
To: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] problem after upgrading 2.10.4 to 2.12.4

Hello all,
I have been following this discussion with interest, as we are in the process of a long-overdue upgrade of our small Lustre system. We are moving everything from

RHEL 6 + Lustre 2.5.2

to

RHEL 7 + Lustre 2.8.0

We are taking this route merely because 2.8.0 supported both RHEL 6 and 7, and so we could keep running, to some extent. (In reality, we have found that v2.8 clients crash our v2.5 MGS on a pretty regular basis.)

Once our OS upgrades are done, the plan is to then take everything to

RHEL 7 + Lustre 2.12.x

 From what I gather on this thread, however... I should expect to have some difficulty reading most of my files, since we have been running 2.5 for a long time. And so I should plan on running Knut's 'update_25_objects' on all of my OSTs? Is that correct? Should I need to do that at Lustre 2.8.0, or not until I get to v2.12? Also, I assume this issue is irrelevant of underlying filesystem - we are still running lustrefs on our 12 OSTs, rather than ZFS.

Thanks so much. This list is always very helpful and interesting.
--
Patrick

On 6/24/20 1:16 AM, Franke, Knut wrote:
> Am Dienstag, den 23.06.2020, 20:03 +0000 schrieb Hebenstreit, Michael:
>> Is there any way to stop the scans on the OSTs?
> Yes, by re-mounting them with -o noscrub. This doesn't fix the issue 
> though.
>
>> Is there any way to force the file system checks?
> As shown in your second mail, the scrubs are already running.
> Unfortunately, they don't (as of Lustre 2.12.4) fix the issue.
>
>> Has anyone found a workaround for the FID sequence errors?
> Yes, see the script attached to LU-13392. In short:
>
> 0. Make sure you have a backup. This might eat your lunch and fry your 
> cat for afters.
> 1. Enable the canmount property on the backend filesystem. For example:
>     [oss]# zfs set canmount=on mountpoint=/mnt/ostX ${fsname}-ost/ost 
> 2. Mount the target as 'zfs'. For example:
>     [oss]# zfs mount ${fsname}-ost/ost 3. update_25_objects /mnt/ostX 
> 4. unmount and remount the OST as 'lustre'
>
> This will rewrite the extended attributes of OST objects created by 
> Lustre 2.4/2.5 to a format compatible with 2.12.
>
>> Can I downgrade from 2.12.4 to 2.10.8 without destroying the FS?
> We've done this successfully, but again - no guarantees.
>
>> Has the error described in https://jira.whamcloud.com/browse/LU-13392
>>   been fixed in 2.12.5?
> I don't think so.
>
> Cheers,
> Knut

-- 

*--------------------------------------------------------------------*
| Patrick Shopbell               Department of Astronomy             |
| pls at astro.caltech.edu          Mail Code 249-17                    |
| (626) 395-4097                 California Institute of Technology  |
| (626) 568-9352  (FAX)          Pasadena, CA  91125                 |
| WWW: http://www.astro.caltech.edu/~pls/                            |
*--------------------------------------------------------------------*

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org