[Lustre-discuss] MDT backup

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Jan 20 06:43:25 PST 2009

On Mon, 2009-01-19 at 22:46 -0600, Alex Kulyavtsev wrote:
> Hi,
> what is the right way to backup MDT ?

I think this is covered in the operations manual.

>  People get worried what will  be 
> "The Day After" the catastrophic disk failure.

Indeed.  You should use "reliable" storage under your MDT.  RAID 1
(mirroring, not necessary limited to only 2 mirrors) is recommended.

> - At present we do LVM snapshots to backup MDT and we were able to 
> restore snapshot to another node.

Excellent.  You have already done more than most people do in planning
and testing your recovery strategy.

> Is there any way to capture changes made on MDT after snapshot done and 
> how close can we get to point of crash ?

Server Changelogs (see the roadmap) might give you what you are looking
for.  Specifically they will be used to implement our Replication
feature.  A replicated copy of your MDT would serve your disaster
recovery requirements as well.  Both of these features are off in the
future however.

> Is there some kind of MDT 
> journal synchronized with LVM snapshot ?


> Is there way to do incremental 
> backups to do it more often ?

You can make snapshots and back them up as frequently as you wish
(obviously to the bandwidth limitations of your backup strategy). You
could even do it incrementally by comparing adjacent snapshots and only
backing up the delta between them.

> - What is an experience with DRBD replication ?

I believe there are folks here who are using DRBD for their Lustre

> There are multiple 
> reports from sites using it and there are also there are reports 
> indicating replicated file system is not clean when master MDT crashes 
> as DRBD knows nothing and does not synchronize with file system on top 
> of it.

DRBD should not need to know anything about the filesystem that is on it
any more than Linux RAID (or LVM) needs to know what is on it.

> Is there way to avoid corruption or it just fixed by fsck ?

Of course, if an MDS just up and dies, the MDT, whether it be on the
local disk or a DRBD replicated copy should be fsck'd to follow best
practises.  That's got nothing to do with DRBD though, but just simply
cleaning up a filesystem left in an "open" state before using it again.

> Can DRBD 
> failover "cleanly" if we do it manually e.g. to upgrade master MDS ?
> Can I verify slave disk is consistent with master and is not corrupt 
> after a year of running ?

Those are questions better asked of the DRBD developers I think.  I am
sure you will get more accurate answers from them.

> It seems like both LVM and DRBD approaches are not perfect.

Perfect in what sense?  Every backup solution has it's drawbacks,
whether they be performance, cost, data freshness, etc.  I think you
just have to decide what aspects are important for you and budget
accordingly.  A "perfect" backup is probably also a very expensive (in
more aspects than just money) backup.  Your "happy medium" probably lies
somewhere around mirroring whether that be within a single chassis (i.e.
Linux RAID) or remote mirroring such as DRBD, or both.

> Are there 
> plans to implement native replication of MDT in lustre ?

Well, as I said before, Server Changelogs (and perhaps a modification to
the planned "Replication" feature to only replicate meta-data) could
probably be utilized to that end.  Do note however that "replication" is
not mirroring as it's more "lazy" (i.e. asynchronous; incoherent) than
the synchronous (and coherent) nature of mirroring so there is still a
chance of data staleness.

> SNS is for OSTs only, right ?

I believe so.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090120/5f6f40ca/attachment.pgp>

More information about the lustre-discuss mailing list