[Lustre-discuss] MDT move aka backup w rsync

Wed Jul 15 10:43:45 PDT 2009

On Jul 15, 2009  18:35 +0200, Thomas Roth wrote:
> I want to move a MDT from one server to another. After studying some
> mails concerning MDT backup, I've just tried (successfully, it seems) to
> do that on a small test system  with rsync:
> 
> - Stop Lustre, umount all servers.
> - Format a suitable disk partition on the new hardware, using the same
> mkfs-options as for the original MDT.
> - Mount the original MDT:    mount   -t ldiskfs      /dev/sdb1    /mnt
> - Mount the target partition: mount   -t ldiskfs   -O ext_attr
> /dev/sdb1    /mnt
> - Copy the data:  rsync   -Xav   oldserver:/mnt/    newserver:/mnt
> - Umount partitions, restart MGS
> - Mount new MDT
> 
> This procedure was described by Jim Garlick on this list. You might note
> that I used the mount option "-O ext_attr" only on the target machine:
> my mistake perhaps, but no visible problems. In fact, I haven't found
> this option mentioned in any man page or on the net. Nevertheless, my
> mount command did not complain about it. So I wonder whether it is
> necessary at all - I seem to have extracted the attributes from the old
> MDT all right, without this mount option - ?

If you have verified that the file data is actually present, this should
work correctly.  In particular, the critical Lustre information is in the
"trusted.lov" xattr, so you need to ensure that is present.  The MDS will
"work" without this xattr, but it will assume all of the files have no
data.

> I'm investigating this because our production MDT seems to have a number
> of problems. In particular the underlying file system is in bad shape,
> fsck correcting a large number of ext3-errors, incorrect inodes and so
> forth. We want to verify that it is not a hardware issue - bit-flipping
> RAID controller, silent "memory corruption", whatever. We have a
> DRBD-mirror of this MDT running, but of course DRBD just reproduces all
> errors on the mirror.  Copying from one ldiskfs to another should avoid
> that?
> 
> The traditional backup method of getting the EAs and tar-ing the MDT
> doesn't finish in finite time. It did before, and the filesystem has
> since grown by a mere 40GB of data, so it shouldn't take that much
> longer - certainly another indication that there is something wrong.
> Of course I have yet to see whether "rsync -Xav" does much better on the
> full system ;-)
> 
> The system runs Debian Etch, kernel 2.6.22, Lustre 1.6.7.1

Direct MDT backup has a problem in 1.6.7.1 due to the addition of the
file size on the MDT inodes.  If you do "--sparse" backups this should
avoid the slowdown.  You could also try the "dump" program, which can
avoid reading the data from sparse files entirely.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.