[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug

Thu Jan 17 11:21:46 PST 2008

Hi,

this are no good news!
Just to be sure what does 'relatively new Lustre filesystems'
or 'newly formatted OSTs' mean?
Is an updated filesystem (from v1.6.2) which are not newly formated, but 
still have less than  20000 objects created on it ever
also effected by this bug?
Or only filesystems first used with 1.6.4.1?

Harald

On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote:
> Attention to all Lustre users.
>
> There was a serious problem discovered with only the 1.6.4.1 release
> which could lead to major data loss on relatively new Lustre filesystems
> in certain situations.  The 1.6.4.2 release is being prepared that will
> fix the problem, and workarounds are available for existing 1.6.4.1 users,
> but in the meantime customers should be aware of the problem and take
> measures to avoid the problem (described at the end of the email).
>
> The problem is described in bug 14631, and while there are no known cases
> that this has impacted a production environment, the consequences can be
> severe and all users should take note.  The bug can cause objects on newly
> formatted OSTs to be deleted if the following conditions are true:
>
> OST has had fewer than 20000 objects created on it ever
> -------------------------------------------------------
> This can be seen on each OST via "cat /proc/fs/lustre/obdfilter/*/last_id"
> which reports the highest object ID ever created on that OST.  If this
> number is greater than 20000 that OST is not at risk of data loss.
>
> The OST must be in recovery at the time the MDT is first mounted
> ----------------------------------------------------------------
> This would happen if the OSS node crashed, or if the OST filesystem is
> unmounted while the MDT or a client is still connected.  Unmounting all
> clients and MDT before the OST is always the correct process and will
> avoid this problem, but it is also possible to force unmount the OST
> with "umount -f /mnt/ost*" (or path as appropriate) to evict all
> connections and avoid the problem.
>
> If the OST is in recovery at mount time then it can be mounted before the
> MDT and "lct --device {OST device number} abort_recovery" used to abort
> recovery before the MDT is mounted.  Alternately, the OST will only wait
> a specific time for recovery (4:10 by default, actual value printed in
> dmesg) and this can be allowed to expire before mounting the MDT to avoid
> the problem.
>
> The MDT is not in recovery when it connects to the OST(s)
> ---------------------------------------------------------
> If the MDT is not in recovery at mount time (i.e. it was shut down
> cleanly), but the OST is in recovery then the MDT will try and get
> information from the OST on existing objects, but fail.  Later in
> the startup process the MDT would incorrectly signal the OST to delete
> all unused objects.  If the MDT is in recovery at startup, then the
> MDT recovery period will expire after the OST recovery and the problem
> will not be triggered.  If the OSTs are mounted and are not in recovery
> when the MDT mounts then the problem will also not be triggered.
>
>
> To avoid triggering the problem:
> --------------------------------
> - unmount the clients and MDT before the OST.  When unmounting
> the OST use "umount -f /mnt/ost*" to force disconnect all clients.
> - mount the OSTs before the MDT, and wait for the recovery to timeout
> (or cancel it, as above) before mounting the MDT
> - create at least 20000 objects on each OST.  Specific OSTs can be
> targetted via "lfs setstripe -i {OST index} /path/to/lustre/file".
> These objects do not need to remain on the OST, there just have to have
> been that many objects created on the OST ever, to activate a sanity
> check when the 1.6.4.1 MDT connects to the OST.
> - upgrade to lustre 1.6.4.2 when available
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn