[lustre-discuss] MGS+MDT migration to a new storage using LVM tools

Tue Jul 21 01:39:18 PDT 2020

Thanks Andreas for your detailed reply.

I took your advice on the MDT0000 naming.
As now the migration is complete I want to share some major problems I had
on the way.

I don't know where to point the blame, the Lustre kernel, e2fprogs, srp
tools, multipath version, lvm version and moving from 512 blocks to 4096.
But as soon as I created the mirror, the server went into a kernel panic,
core dump loop.

I managed to stop it only by breaking the mirror from another server
connected to the same storage.
It took me a full day to recover the system.

Today I restarted the process, this time from a different server ,not
running Lustre, which I already used for a vm lun lvm migration.
The exact same procedure ran flawlessly and I needed only to refresh the
lvm on the MDS to be able to mount the migrated mdt.

Cheers,
David

On Sun, Jul 19, 2020 at 12:27 PM Andreas Dilger <adilger at dilger.ca> wrote:

> On Jul 19, 2020, at 12:41 AM, David Cohen <cdavid at physics.technion.ac.il>
> wrote:
> >
> > Hi,
> > We have a combined MGS+MDT and I'm looking for a migration to new
> storage with a minimal disruption to the running jobs on the cluster.
> >
> > Can anyone find problems in the scenario below and/or suggest another
> solution?
> > I would appreciate also "no problems" replies to reassure the scenario
> before I proceed.
> >
> > Current configuration:
> > The mdt is a logical volume in a lustre_pool VG on a /dev/mapper/MDT0001
> PV
>
> I've been running Lustre on LVM at home for many years, and have done
> pvmove
> of the underlying storage to new devices without any problems.
>
> > Migration plan:
> > Add /dev/mapper/MDT0002 new disk (multipath)
>
> I would really recommend that you *not* use MDT0002 as the name of the PV.
> This is very confusing because the MDT itself (at the Lustre level) is
> almost certainly named "<fsname>-MDT0000", and if you ever add new MDTs to
> this filesystem it will be confusing as to which *Lustre* MDT is on which
> underlying PV.  Instead, I'd take the opportunity to name this "MDT0000" to
> match the actual Lustre MDT0000 target name.
>
> > extend the VG:
> > pvcreate /dev/mapper/MDT0002
> > vgextend  lustre_pool /dev/mapper/MDT0002
> > mirror the mdt to the new disk:
> > lvconvert -m 1 /dev/lustre_pool/TECH_MDT /dev/mapper/MDT0002
>
> I typically just use "pvmove", but doing this by adding a mirror and then
> splitting it off is probably safer.  That would still leave you with a full
> copy of the MDT on the original PV if something happened in the middle.
>
> > wait the mirrored disk to sync:
> > lvs -o+devices
> > when it's fully synced unmount the MDT, remove the old disk from the
> mirror:
> > lvconvert -m 0 /dev/lustre_pool/TECH_MDT /dev/mapper/MDT0001
> > and remove the old disk from the pool:
> > vgreduce lustre_pool /dev/mapper/MDT0001
> > pvremove /dev/mapper/MDT0001
> > remount the MDT and let the clients few minutes to recover the
> connection.
>
> In my experience with pvmove, there is no need to do anything with the
> clients,
> as long as you are not also moving the MDT to a new server, since the
> LVM/DM
> operations are totally transparent to both the Lustre server and client.
>
> After my pvmove (your "lvconvert -m 0"), I would just vgreduce the old PV
> from
> the VG, and then leave it in the system (internal HDD) until the next time
> I
> needed to shut down the server.  If you have hot-plug capability for the
> PVs,
> then you don't even need to wait for that.
>
> Cheers, Andreas
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200721/9882cd0d/attachment.html>