[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug

Thu Jan 17 14:29:26 PST 2008

On Jan 17, 2008  15:05 -0700, Lundgren, Andrew wrote:
> We are getting ready to deploy a brand new cluster.  Any time from on 1.6.4.2?

It has been built and is undergoing QA testing now.  We hope to have it
ready for Monday, but I can't promise that.

> > -----Original Message-----
> > From: lustre-discuss-bounces at clusterfs.com
> > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of
> > Andreas Dilger
> > Sent: Thursday, January 17, 2008 1:31 PM
> > To: Harald van Pee
> > Cc: Lustre User Discussion Mailing List
> > Subject: Re: [Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
> >
> > On Jan 17, 2008  20:21 +0100, Harald van Pee wrote:
> > > this are no good news!
> >
> > Definitely not, but it is hoped that by releasing a
> > notification of this issue any problems with existing systems
> > can be avoided.
> >
> > > Just to be sure what does 'relatively new Lustre filesystems'
> > > or 'newly formatted OSTs' mean?
> >
> > This means "any OSTs with < 20000 objects ever created", no
> > matter how old they actually are.
> >
> > > Is an updated filesystem (from v1.6.2) which are not newly
> > formated,
> > > but still have less than  20000 objects created on it ever also
> > > effected by this bug?
> > > Or only filesystems first used with 1.6.4.1?
> >
> > It doesn't matter what versions were previously used, the
> > problem exists only while a 1.6.4.1 MDS is in use, due to a
> > defect added while removing another far less common problem.
> >
> > > On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote:
> > > > Attention to all Lustre users.
> > > >
> > > > There was a serious problem discovered with only the
> > 1.6.4.1 release
> > > > which could lead to major data loss on relatively new Lustre
> > > > filesystems in certain situations.  The 1.6.4.2 release is being
> > > > prepared that will fix the problem, and workarounds are available
> > > > for existing 1.6.4.1 users, but in the meantime customers
> > should be
> > > > aware of the problem and take measures to avoid the
> > problem (described at the end of the email).
> > > >
> > > > The problem is described in bug 14631, and while there
> > are no known
> > > > cases that this has impacted a production environment, the
> > > > consequences can be severe and all users should take
> > note.  The bug
> > > > can cause objects on newly formatted OSTs to be deleted
> > if the following conditions are true:
> > > >
> > > > OST has had fewer than 20000 objects created on it ever
> > > > -------------------------------------------------------
> > > > This can be seen on each OST via "cat
> > /proc/fs/lustre/obdfilter/*/last_id"
> > > > which reports the highest object ID ever created on that OST.  If
> > > > this number is greater than 20000 that OST is not at risk
> > of data loss.
> > > >
> > > > The OST must be in recovery at the time the MDT is first mounted
> > > > ----------------------------------------------------------------
> > > > This would happen if the OSS node crashed, or if the OST
> > filesystem
> > > > is unmounted while the MDT or a client is still connected.
> > > > Unmounting all clients and MDT before the OST is always
> > the correct
> > > > process and will avoid this problem, but it is also possible to
> > > > force unmount the OST with "umount -f /mnt/ost*" (or path as
> > > > appropriate) to evict all connections and avoid the problem.
> > > >
> > > > If the OST is in recovery at mount time then it can be mounted
> > > > before the MDT and "lct --device {OST device number}
> > abort_recovery"
> > > > used to abort recovery before the MDT is mounted.
> > Alternately, the
> > > > OST will only wait a specific time for recovery (4:10 by default,
> > > > actual value printed in
> > > > dmesg) and this can be allowed to expire before mounting
> > the MDT to
> > > > avoid the problem.
> > > >
> > > > The MDT is not in recovery when it connects to the OST(s)
> > > > ---------------------------------------------------------
> > > > If the MDT is not in recovery at mount time (i.e. it was
> > shut down
> > > > cleanly), but the OST is in recovery then the MDT will
> > try and get
> > > > information from the OST on existing objects, but fail.  Later in
> > > > the startup process the MDT would incorrectly signal the OST to
> > > > delete all unused objects.  If the MDT is in recovery at startup,
> > > > then the MDT recovery period will expire after the OST
> > recovery and
> > > > the problem will not be triggered.  If the OSTs are
> > mounted and are
> > > > not in recovery when the MDT mounts then the problem will
> > also not be triggered.
> > > >
> > > >
> > > > To avoid triggering the problem:
> > > > --------------------------------
> > > > - unmount the clients and MDT before the OST.  When
> > unmounting the
> > > > OST use "umount -f /mnt/ost*" to force disconnect all clients.
> > > > - mount the OSTs before the MDT, and wait for the recovery to
> > > > timeout (or cancel it, as above) before mounting the MDT
> > > > - create at least 20000 objects on each OST.  Specific
> > OSTs can be
> > > > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file".
> > > > These objects do not need to remain on the OST, there
> > just have to
> > > > have been that many objects created on the OST ever, to
> > activate a
> > > > sanity check when the 1.6.4.1 MDT connects to the OST.
> > > > - upgrade to lustre 1.6.4.2 when available
> > > >
> > > > Cheers, Andreas
> > > > --
> > > > Andreas Dilger
> > > > Sr. Staff Engineer, Lustre Group
> > > > Sun Microsystems of Canada, Inc.
> > > > _______________________________________________
> > > > Lustre-discuss mailing list
> > > > Lustre-discuss at clusterfs.com
> > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > >
> > > --
> > > Harald van Pee
> > >
> > > Helmholtz-Institut fuer Strahlen- und Kernphysik der
> > Universitaet Bonn
> > >
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.