[Lustre-discuss] MDT backup (using tar) taking very long
Ben Evans
Ben.Evans at terascala.com
Thu Sep 2 06:52:26 PDT 2010
I'd use the dd/gzip option, though you may want to write it to another
system, and have that system do the compression.
If you're going that route, you might want to run fsck on the dd'd image
before compression, to make sure any errors are fixed.
-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Frederik
Ferner
Sent: Thursday, September 02, 2010 9:42 AM
To: lustre-discuss at lists.lustre.org
Subject: [Lustre-discuss] MDT backup (using tar) taking very long
Hi List,
we are currently reviewing our backup policy for our Lustre file system
as backups of the MDT are taking longer and longer.
So far we are creating a LVM snapshot of our MDT, mount this via
ldiskfs, run getfattr and getfacl followed by tar (RHEL5 version),
basically following the instructions from the manual. The tar options
include --sparse and --numeric-owner.
At the moment I've got a backup running where the tar process started on
Tuesday, so it has now been running more than 24h. Including the
getfattr and the getfacl calls (running in parallel) the whole backup
has so far been running for more than 48h to backup the MDT for a 700GB
MDT for a 214TB Lustre file system. The tar file created so far is about
2GB compressed with gzip.
Tar is currently using anything between 30% and 100% cpu according to
top, gzip is below 1% cpu usage, overall the MDS is fairly idle, load is
about 1.2 on a 8 core machine, top reports this for the cpus.
<snip>
Cpu(s): 4.2%us, 4.5%sy, 0.0%ni, 85.8%id, 5.2%wa, 0.0%hi, 0.2%si,
0.0%st
</snip>
vmstat is not showing any I/O worth mentioning, a few (10-1000) blocks
per second.
Some file system details for the Lustre file system below. The MDS is
running lustre 1.6.7.2.ddn3.5 plus a patch for bz #22820 on RHEL5.
[bnh65367 at cs04r-sc-com01-18 ~]$ lfs df -h
UUID bytes Used Available Use% Mounted on
lustre01-MDT0000_UUID 699.9G 22.1G 677.8G 3%
/mnt/lustre01[MDT:0]
[snip]
filesystem summary: 214.9T 146.6T 68.3T 68% /mnt/lustre01
[bnh65367 at cs04r-sc-com01-18 ~]$ lfs df -ih
UUID Inodes IUsed IFree IUse% Mounted on
lustre01-MDT0000_UUID 200.0M 71.0M 129.0M 35%
/mnt/lustre01[MDT:0]
[snip]
filesystem summary: 200.0M 71.0M 129.0M 35% /mnt/lustre01
Is this comparable to the backup times other people experience using
tar?
Could this be because tar has to read the whole file (all zeros) in
before deciding that this is a sparse file?
For comparison a backup using dd and gzip did 'only' takes about 8h and
gzip was using 100% of one cpu core for all of that time, so using a
faster compression algorithm this seems a much better option. Are there
any dangerous downsides to this approach that I have missed?
Kind regards,
Frederik
--
Frederik Ferner
Computer Systems Administrator phone: +44 1235 77 8624
Diamond Light Source Ltd. mob: +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list