[lustre-discuss] problem after upgrading 2.10.4 to 2.12.4

Wed Jun 24 12:37:39 PDT 2020

Thanks for that info, Michael. So sounds like I could go ahead and
get the 2.8 upgrades done on everything, and then hold off on any
further upgrades. We don't have a very urgent user base, so that
would work okay for us.

Hopefully this is still an "active" issue that might be solved...

Thanks again,
Patrick

On 6/24/20 11:43 AM, Hebenstreit, Michael wrote:
> I would not plan a direct upgrade until Whamcloud fixes the underlying issue. Currently the only viable way seem to be a step by step upgrade. I imagine you'd first upgrade to 2.10.8, and then copy all old file to a new place (something like: mkdir .new_copy; rsync -a  * .new_copy; rm -rf *; mv .new_copy/* .; rmdir .new_copy) so that all files have been re-created with correct information. Knut's script is a hack and last minute resort.
>
> -----Original Message-----
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> On Behalf Of Patrick Shopbell
> Sent: Wednesday, June 24, 2020 12:36
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] problem after upgrading 2.10.4 to 2.12.4
>
>
> Hello all,
> I have been following this discussion with interest, as we are in the process of a long-overdue upgrade of our small Lustre system. We are moving everything from
>
> RHEL 6 + Lustre 2.5.2
>
> to
>
> RHEL 7 + Lustre 2.8.0
>
> We are taking this route merely because 2.8.0 supported both RHEL 6 and 7, and so we could keep running, to some extent. (In reality, we have found that v2.8 clients crash our v2.5 MGS on a pretty regular basis.)
>
> Once our OS upgrades are done, the plan is to then take everything to
>
> RHEL 7 + Lustre 2.12.x
>
>   From what I gather on this thread, however... I should expect to have some difficulty reading most of my files, since we have been running 2.5 for a long time. And so I should plan on running Knut's 'update_25_objects' on all of my OSTs? Is that correct? Should I need to do that at Lustre 2.8.0, or not until I get to v2.12? Also, I assume this issue is irrelevant of underlying filesystem - we are still running lustrefs on our 12 OSTs, rather than ZFS.
>
> Thanks so much. This list is always very helpful and interesting.
> --
> Patrick
>
>
> On 6/24/20 1:16 AM, Franke, Knut wrote:
>> Am Dienstag, den 23.06.2020, 20:03 +0000 schrieb Hebenstreit, Michael:
>>> Is there any way to stop the scans on the OSTs?
>> Yes, by re-mounting them with -o noscrub. This doesn't fix the issue
>> though.
>>
>>> Is there any way to force the file system checks?
>> As shown in your second mail, the scrubs are already running.
>> Unfortunately, they don't (as of Lustre 2.12.4) fix the issue.
>>
>>> Has anyone found a workaround for the FID sequence errors?
>> Yes, see the script attached to LU-13392. In short:
>>
>> 0. Make sure you have a backup. This might eat your lunch and fry your
>> cat for afters.
>> 1. Enable the canmount property on the backend filesystem. For example:
>>      [oss]# zfs set canmount=on mountpoint=/mnt/ostX ${fsname}-ost/ost
>> 2. Mount the target as 'zfs'. For example:
>>      [oss]# zfs mount ${fsname}-ost/ost 3. update_25_objects /mnt/ostX
>> 4. unmount and remount the OST as 'lustre'
>>
>> This will rewrite the extended attributes of OST objects created by
>> Lustre 2.4/2.5 to a format compatible with 2.12.
>>
>>> Can I downgrade from 2.12.4 to 2.10.8 without destroying the FS?
>> We've done this successfully, but again - no guarantees.
>>
>>> Has the error described in https://jira.whamcloud.com/browse/LU-13392
>>>    been fixed in 2.12.5?
>> I don't think so.
>>
>> Cheers,
>> Knut
>

-- 

*--------------------------------------------------------------------*
| Patrick Shopbell               Department of Astronomy             |
| pls at astro.caltech.edu          Mail Code 249-17                    |
| (626) 395-4097                 California Institute of Technology  |
| (626) 568-9352  (FAX)          Pasadena, CA  91125                 |
| WWW: http://www.astro.caltech.edu/~pls/                            |
*--------------------------------------------------------------------*