[lustre-discuss] design to enable kernel updates

Vicker, Darby (JSC-EG311) darby.vicker-1 at nasa.gov
Mon Feb 6 12:58:37 PST 2017


Agreed.  We are just about to go into production on our next LFS with the 
setup described.  We had to get past a bug in the MGS failover for 
dual-homed servers but as of last week that is done and everything is 
working great (see "MGS failover problem" thread on this mailing list from
this month and last).  We are in the process of syncing our existing LFS
to this new one and I've failed over/rebooted/upgraded the new LFS servers
many times now to make sure we can do this in practice when the new LFS goes
into production.  Its working beautifully.  

Many thanks to the lustre developers for their continued efforts.  We have 
been using and have been fans of lustre for quite some time now and it 
just keeps getting better.  

-----Original Message-----
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Ben Evans <bevans at cray.com>
Date: Monday, February 6, 2017 at 2:22 PM
To: Brian Andrus <toomuchit at gmail.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] design to enable kernel updates

It's certainly possible.  When I've done that sort of thing, you upgrade
the OS on all the servers first, boot half of them (the A side) to the new
image, all the targets will fail over to the B servers.  Once the A side
is up, reboot the B half to the new OS.  Finally, do a failback to the
"normal" running state.

At least when I've done it, you'll want to do the failovers manually so
the HA infrastructure doesn't surprise you for any reason.

-Ben

On 2/6/17, 2:54 PM, "lustre-discuss on behalf of Brian Andrus"
<lustre-discuss-bounces at lists.lustre.org on behalf of toomuchit at gmail.com>
wrote:

>All,
>
>I have been contemplating how lustre could be configured such that I
>could update the kernel on each server without downtime.
>
>It seems this is _almost_ possible when you have a san system so you
>have failover for OSTs and MDTs. BUT the MGS/MGT seems to be the
>problematic one, since rebooting that seems cause downtime that cannot
>be avoided.
>
>If you have a system where the disks are physically part of the OSS
>hardware, you are out of luck. The hypothetical scenario I am using is
>if someone had a VM that was a qcow image on a lustre mount (basically
>an active, open file being read/written to continuously). How could
>lustre be built to ensure anyone on the VM would not notice a kernel
>upgrade to the underlying lustre servers.
>
>
>Could such a setup be done? It seems that would be a better use case for
>something like GPFS or Gluster, but being a die-hard lustre enthusiast,
>I want to at least show it could be done.
>
>
>Thanks in advance,
>
>Brian Andrus
>
>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




More information about the lustre-discuss mailing list