[Lustre-discuss] Can lustre be trusted to keep my data safe?

Jim Garlick garlick at llnl.gov
Wed May 14 17:36:17 PDT 2008


Hi John,

I has assumed given your requirements that you would spring for
the service contract.

I think parallel file systems are inherently complicated and Lustre
is competitive in terms of maturity, etc. with other similar products.

Jim

On Wed, May 14, 2008 at 05:21:28PM -0400, jrs wrote:
> Thanks for the insight, Jim (and Mike and Aaron),
> 
> Unfortunately, I've now gotten contradictory views (not terribly
> surprising: people have different views and experiences, etc...).
> 
> Mike (who posted earlier) implied that, if the underlying storage
> and network were solid and if failover is done right that it
> can be trusted.
> 
> Jim, would having a support contract change your view?  Or, might
> the progression toward finding that right version/right hardware
> be dangerous even with support?  Is this something related to
> the codes immaturity?  Or just a complex problem?
> 
> thanks much,
> John
> 
> 
> Jim Garlick wrote:
> >John,
> >
> >Lustre can be damn robust if you get the right version on the right
> >hardware.  Also, I think the new engineering practices and future
> >architecture that uses ZFS on the back end will only improve this.
> >
> >That said, your predicament is troubling.  As a general rule I would not
> >trust any parallel file system that I know of with mission critical data.
> >Failures do happen; indeed we have lost data in Lustre on several 
> >occasions.
> >
> >In some sense we're in a similar position.  The data we put in Lustre
> >is important to our mission (well some of it anyway), costly to 
> >regenerate, and impractical to back up with a general backup policy.  
> >
> >What we do is basically advertise Lustre as temporary scratch space and
> >provide an HPSS tape archive for users to copy their most critical data to.
> >That may not work in your case, but if I were you I would at least have
> >some sort of disaster plan for recovering or regenerating your data.
> >In short, don't trust Lustre or any parallel file system as the sole
> >repository for your mission critical data.
> >
> >Jim
> >
> >On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote:
> >>Greetings all,
> >>
> >>I just spoke with someone at a large computing company who
> >>has a close relationship with lustre/sun (a reseller, I guess).
> >>This person described lustre as being something that Sun
> >>"would not recommend for mission critical use."
> >>
> >>Can this be true?
> >>
> >>I work for a small/medium company that does image processing.
> >>We have about 700TB of data presently and might be at 2PB within
> >>the next couple of years.  Owing to the amount of data we don't
> >>make backups for most of it and trust raid 6 on our hardware raid
> >>boxes (nexsan Satabeast) to fail more slowly than we can replace
> >>disks.  Over the last couple of years we've had great luck and,
> >>I believe, have never lost data owing to a failure with this
> >>hardware (software or human error is another matter ;-).
> >>However, the unbacked up data is "mission critical."  Though
> >>it can, probably, all be reconstructed or reacquired, as a practical
> >>matter losing a significant quantity of this data could be
> >>catastrophic for our business.
> >>
> >>So, what do you think, can lustre be trusted to keep our
> >>data safe at our company?  Assume in answering that we have
> >>failover working properly.  We can also withstand some blocking
> >>of the filesystem while a failover event completes, i.e., not
> >>having the filesystem available for some amount of time is
> >>not a problem, but having directory important-data/ disappear
> >>is a HUGE problem.
> >>
> >>Thanks for any help or guidance,
> >>
> >>John
> >>_______________________________________________
> >>Lustre-discuss mailing list
> >>Lustre-discuss at lists.lustre.org
> >>http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list