[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug

Thu Jan 17 14:05:04 PST 2008

We are getting ready to deploy a brand new cluster.  Any time from on 1.6.4.2?

--
Andrew

> -----Original Message-----
> From: lustre-discuss-bounces at clusterfs.com
> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of
> Andreas Dilger
> Sent: Thursday, January 17, 2008 1:31 PM
> To: Harald van Pee
> Cc: Lustre User Discussion Mailing List
> Subject: Re: [Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
>
> On Jan 17, 2008  20:21 +0100, Harald van Pee wrote:
> > this are no good news!
>
> Definitely not, but it is hoped that by releasing a
> notification of this issue any problems with existing systems
> can be avoided.
>
> > Just to be sure what does 'relatively new Lustre filesystems'
> > or 'newly formatted OSTs' mean?
>
> This means "any OSTs with < 20000 objects ever created", no
> matter how old they actually are.
>
> > Is an updated filesystem (from v1.6.2) which are not newly
> formated,
> > but still have less than  20000 objects created on it ever also
> > effected by this bug?
> > Or only filesystems first used with 1.6.4.1?
>
> It doesn't matter what versions were previously used, the
> problem exists only while a 1.6.4.1 MDS is in use, due to a
> defect added while removing another far less common problem.
>
> > On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote:
> > > Attention to all Lustre users.
> > >
> > > There was a serious problem discovered with only the
> 1.6.4.1 release
> > > which could lead to major data loss on relatively new Lustre
> > > filesystems in certain situations.  The 1.6.4.2 release is being
> > > prepared that will fix the problem, and workarounds are available
> > > for existing 1.6.4.1 users, but in the meantime customers
> should be
> > > aware of the problem and take measures to avoid the
> problem (described at the end of the email).
> > >
> > > The problem is described in bug 14631, and while there
> are no known
> > > cases that this has impacted a production environment, the
> > > consequences can be severe and all users should take
> note.  The bug
> > > can cause objects on newly formatted OSTs to be deleted
> if the following conditions are true:
> > >
> > > OST has had fewer than 20000 objects created on it ever
> > > -------------------------------------------------------
> > > This can be seen on each OST via "cat
> /proc/fs/lustre/obdfilter/*/last_id"
> > > which reports the highest object ID ever created on that OST.  If
> > > this number is greater than 20000 that OST is not at risk
> of data loss.
> > >
> > > The OST must be in recovery at the time the MDT is first mounted
> > > ----------------------------------------------------------------
> > > This would happen if the OSS node crashed, or if the OST
> filesystem
> > > is unmounted while the MDT or a client is still connected.
> > > Unmounting all clients and MDT before the OST is always
> the correct
> > > process and will avoid this problem, but it is also possible to
> > > force unmount the OST with "umount -f /mnt/ost*" (or path as
> > > appropriate) to evict all connections and avoid the problem.
> > >
> > > If the OST is in recovery at mount time then it can be mounted
> > > before the MDT and "lct --device {OST device number}
> abort_recovery"
> > > used to abort recovery before the MDT is mounted.
> Alternately, the
> > > OST will only wait a specific time for recovery (4:10 by default,
> > > actual value printed in
> > > dmesg) and this can be allowed to expire before mounting
> the MDT to
> > > avoid the problem.
> > >
> > > The MDT is not in recovery when it connects to the OST(s)
> > > ---------------------------------------------------------
> > > If the MDT is not in recovery at mount time (i.e. it was
> shut down
> > > cleanly), but the OST is in recovery then the MDT will
> try and get
> > > information from the OST on existing objects, but fail.  Later in
> > > the startup process the MDT would incorrectly signal the OST to
> > > delete all unused objects.  If the MDT is in recovery at startup,
> > > then the MDT recovery period will expire after the OST
> recovery and
> > > the problem will not be triggered.  If the OSTs are
> mounted and are
> > > not in recovery when the MDT mounts then the problem will
> also not be triggered.
> > >
> > >
> > > To avoid triggering the problem:
> > > --------------------------------
> > > - unmount the clients and MDT before the OST.  When
> unmounting the
> > > OST use "umount -f /mnt/ost*" to force disconnect all clients.
> > > - mount the OSTs before the MDT, and wait for the recovery to
> > > timeout (or cancel it, as above) before mounting the MDT
> > > - create at least 20000 objects on each OST.  Specific
> OSTs can be
> > > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file".
> > > These objects do not need to remain on the OST, there
> just have to
> > > have been that many objects created on the OST ever, to
> activate a
> > > sanity check when the 1.6.4.1 MDT connects to the OST.
> > > - upgrade to lustre 1.6.4.2 when available
> > >
> > > Cheers, Andreas
> > > --
> > > Andreas Dilger
> > > Sr. Staff Engineer, Lustre Group
> > > Sun Microsystems of Canada, Inc.
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> > --
> > Harald van Pee
> >
> > Helmholtz-Institut fuer Strahlen- und Kernphysik der
> Universitaet Bonn
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>