[Lustre-discuss] Can lustre be trusted to keep my data safe?

jrs botemout at gmail.com
Wed May 14 14:21:28 PDT 2008


Thanks for the insight, Jim (and Mike and Aaron),

Unfortunately, I've now gotten contradictory views (not terribly
surprising: people have different views and experiences, etc...).

Mike (who posted earlier) implied that, if the underlying storage
and network were solid and if failover is done right that it
can be trusted.

Jim, would having a support contract change your view?  Or, might
the progression toward finding that right version/right hardware
be dangerous even with support?  Is this something related to
the codes immaturity?  Or just a complex problem?

thanks much,
John


Jim Garlick wrote:
> John,
> 
> Lustre can be damn robust if you get the right version on the right
> hardware.  Also, I think the new engineering practices and future
> architecture that uses ZFS on the back end will only improve this.
> 
> That said, your predicament is troubling.  As a general rule I would not
> trust any parallel file system that I know of with mission critical data.
> Failures do happen; indeed we have lost data in Lustre on several occasions.
> 
> In some sense we're in a similar position.  The data we put in Lustre
> is important to our mission (well some of it anyway), costly to regenerate, 
> and impractical to back up with a general backup policy.  
> 
> What we do is basically advertise Lustre as temporary scratch space and
> provide an HPSS tape archive for users to copy their most critical data to.
> That may not work in your case, but if I were you I would at least have
> some sort of disaster plan for recovering or regenerating your data.
> In short, don't trust Lustre or any parallel file system as the sole
> repository for your mission critical data.
> 
> Jim
> 
> On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote:
>> Greetings all,
>>
>> I just spoke with someone at a large computing company who
>> has a close relationship with lustre/sun (a reseller, I guess).
>> This person described lustre as being something that Sun
>> "would not recommend for mission critical use."
>>
>> Can this be true?
>>
>> I work for a small/medium company that does image processing.
>> We have about 700TB of data presently and might be at 2PB within
>> the next couple of years.  Owing to the amount of data we don't
>> make backups for most of it and trust raid 6 on our hardware raid
>> boxes (nexsan Satabeast) to fail more slowly than we can replace
>> disks.  Over the last couple of years we've had great luck and,
>> I believe, have never lost data owing to a failure with this
>> hardware (software or human error is another matter ;-).
>> However, the unbacked up data is "mission critical."  Though
>> it can, probably, all be reconstructed or reacquired, as a practical
>> matter losing a significant quantity of this data could be
>> catastrophic for our business.
>>
>> So, what do you think, can lustre be trusted to keep our
>> data safe at our company?  Assume in answering that we have
>> failover working properly.  We can also withstand some blocking
>> of the filesystem while a failover event completes, i.e., not
>> having the filesystem available for some amount of time is
>> not a problem, but having directory important-data/ disappear
>> is a HUGE problem.
>>
>> Thanks for any help or guidance,
>>
>> John
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list