[Lustre-discuss] Can lustre be trusted to keep my data safe?

Jim Garlick garlick at llnl.gov
Wed May 14 13:21:18 PDT 2008


John,

Lustre can be damn robust if you get the right version on the right
hardware.  Also, I think the new engineering practices and future
architecture that uses ZFS on the back end will only improve this.

That said, your predicament is troubling.  As a general rule I would not
trust any parallel file system that I know of with mission critical data.
Failures do happen; indeed we have lost data in Lustre on several occasions.

In some sense we're in a similar position.  The data we put in Lustre
is important to our mission (well some of it anyway), costly to regenerate, 
and impractical to back up with a general backup policy.  

What we do is basically advertise Lustre as temporary scratch space and
provide an HPSS tape archive for users to copy their most critical data to.
That may not work in your case, but if I were you I would at least have
some sort of disaster plan for recovering or regenerating your data.
In short, don't trust Lustre or any parallel file system as the sole
repository for your mission critical data.

Jim

On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote:
> Greetings all,
> 
> I just spoke with someone at a large computing company who
> has a close relationship with lustre/sun (a reseller, I guess).
> This person described lustre as being something that Sun
> "would not recommend for mission critical use."
> 
> Can this be true?
> 
> I work for a small/medium company that does image processing.
> We have about 700TB of data presently and might be at 2PB within
> the next couple of years.  Owing to the amount of data we don't
> make backups for most of it and trust raid 6 on our hardware raid
> boxes (nexsan Satabeast) to fail more slowly than we can replace
> disks.  Over the last couple of years we've had great luck and,
> I believe, have never lost data owing to a failure with this
> hardware (software or human error is another matter ;-).
> However, the unbacked up data is "mission critical."  Though
> it can, probably, all be reconstructed or reacquired, as a practical
> matter losing a significant quantity of this data could be
> catastrophic for our business.
> 
> So, what do you think, can lustre be trusted to keep our
> data safe at our company?  Assume in answering that we have
> failover working properly.  We can also withstand some blocking
> of the filesystem while a failover event completes, i.e., not
> having the filesystem available for some amount of time is
> not a problem, but having directory important-data/ disappear
> is a HUGE problem.
> 
> Thanks for any help or guidance,
> 
> John
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list