[lustre-discuss] Is live upgrade of 2.4 to 2.5 unproblematic?

Peter Bortas bortas at gmail.com
Mon Jul 11 05:46:28 PDT 2016


Hi Patrick,

Thanks for the additional input! I'll skip the exiting live upgrade
this time then.

Regards,
-- 
Peter Bortas, NSC

On Mon, Jul 11, 2016 at 1:39 AM, Patrick Farrell <paf at cray.com> wrote:
> Because of the issue highlighted by Andreas - a great number of possible
> states when a job is running - Cray does our upgrades with the system quiet.
> Live upgrades aren't something we even consider - The potential damage is
> too large for the time saved.  Especially since the actual *upgrade* usually
> doesn't take very long at all, generally speaking.  For 2.4 to 2.5, the
> 'clean' version is just stop activity to the filesystem, unmount it on
> clients, stop it/unmount it server side, install the new Lustre RPMs, and
> start it up again.  This is relatively quick.
>
> ________________________________
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of
> Dilger, Andreas <andreas.dilger at intel.com>
> Sent: Sunday, July 10, 2016 5:53:38 PM
> To: Peter Bortas
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] Is live upgrade of 2.4 to 2.5 unproblematic?
>
> We typically test 2.x->2.x+1 upgrades, both live and offline, for every
> version of Lustre. That said, there are a large number of possible states
> that may occur with a running job, so it isn't possible to test everything.
> If you are ready to abort the long-running job, then trying the live upgrade
> and having to restart if it fails isn't any worse.
>
> I'd always recommend to make a backup of the MDT, regardless of whether you
> are doing an upgrade or not, since it is a lot easier to restore only the
> MDT if there are problems than to restore the whole filesystem.
>
> Cheers, Andreas
>
>> On Jul 8, 2016, at 09:08, Peter Bortas <bortas at gmail.com> wrote:
>>
>> I'm upgrading a few ZFS backed filesystems from 2.4.2 to 2.5.3 (both
>> from the llnl chaos branch). Clients are already running 2.5EE. It's a
>> simple setup with no failover or mirroring of MDSs or OSSs. Originally
>> the plan was to do this with the filesystems unmounted on the clients,
>> but it looks like it will be hard to get a window to do that any time
>> soon.
>>
>> Are there any known problems just doing an online upgrade 2.4 -> 2.5?
>>
>> Is the recommended method still OSSs first and MDS last?
>>
>> (Obviously the clients will lock up if they access these filesystems,
>> but locking them up for a fraction of a day beats aborting a 7 day
>> compute job.)
>>
>> Regards,
>> --
>> Peter Bortas, NSC
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list