[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug

Thu Jan 17 12:31:10 PST 2008

On Jan 17, 2008  20:21 +0100, Harald van Pee wrote:
> this are no good news!

Definitely not, but it is hoped that by releasing a notification of
this issue any problems with existing systems can be avoided.

> Just to be sure what does 'relatively new Lustre filesystems'
> or 'newly formatted OSTs' mean?

This means "any OSTs with < 20000 objects ever created", no matter
how old they actually are.

> Is an updated filesystem (from v1.6.2) which are not newly formated, but 
> still have less than  20000 objects created on it ever
> also effected by this bug?
> Or only filesystems first used with 1.6.4.1?

It doesn't matter what versions were previously used, the problem exists
only while a 1.6.4.1 MDS is in use, due to a defect added while removing
another far less common problem.

> On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote:
> > Attention to all Lustre users.
> >
> > There was a serious problem discovered with only the 1.6.4.1 release
> > which could lead to major data loss on relatively new Lustre filesystems
> > in certain situations.  The 1.6.4.2 release is being prepared that will
> > fix the problem, and workarounds are available for existing 1.6.4.1 users,
> > but in the meantime customers should be aware of the problem and take
> > measures to avoid the problem (described at the end of the email).
> >
> > The problem is described in bug 14631, and while there are no known cases
> > that this has impacted a production environment, the consequences can be
> > severe and all users should take note.  The bug can cause objects on newly
> > formatted OSTs to be deleted if the following conditions are true:
> >
> > OST has had fewer than 20000 objects created on it ever
> > -------------------------------------------------------
> > This can be seen on each OST via "cat /proc/fs/lustre/obdfilter/*/last_id"
> > which reports the highest object ID ever created on that OST.  If this
> > number is greater than 20000 that OST is not at risk of data loss.
> >
> > The OST must be in recovery at the time the MDT is first mounted
> > ----------------------------------------------------------------
> > This would happen if the OSS node crashed, or if the OST filesystem is
> > unmounted while the MDT or a client is still connected.  Unmounting all
> > clients and MDT before the OST is always the correct process and will
> > avoid this problem, but it is also possible to force unmount the OST
> > with "umount -f /mnt/ost*" (or path as appropriate) to evict all
> > connections and avoid the problem.
> >
> > If the OST is in recovery at mount time then it can be mounted before the
> > MDT and "lct --device {OST device number} abort_recovery" used to abort
> > recovery before the MDT is mounted.  Alternately, the OST will only wait
> > a specific time for recovery (4:10 by default, actual value printed in
> > dmesg) and this can be allowed to expire before mounting the MDT to avoid
> > the problem.
> >
> > The MDT is not in recovery when it connects to the OST(s)
> > ---------------------------------------------------------
> > If the MDT is not in recovery at mount time (i.e. it was shut down
> > cleanly), but the OST is in recovery then the MDT will try and get
> > information from the OST on existing objects, but fail.  Later in
> > the startup process the MDT would incorrectly signal the OST to delete
> > all unused objects.  If the MDT is in recovery at startup, then the
> > MDT recovery period will expire after the OST recovery and the problem
> > will not be triggered.  If the OSTs are mounted and are not in recovery
> > when the MDT mounts then the problem will also not be triggered.
> >
> >
> > To avoid triggering the problem:
> > --------------------------------
> > - unmount the clients and MDT before the OST.  When unmounting
> > the OST use "umount -f /mnt/ost*" to force disconnect all clients.
> > - mount the OSTs before the MDT, and wait for the recovery to timeout
> > (or cancel it, as above) before mounting the MDT
> > - create at least 20000 objects on each OST.  Specific OSTs can be
> > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file".
> > These objects do not need to remain on the OST, there just have to have
> > been that many objects created on the OST ever, to activate a sanity
> > check when the 1.6.4.1 MDT connects to the OST.
> > - upgrade to lustre 1.6.4.2 when available
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> 
> -- 
> Harald van Pee
> 
> Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.