[Lustre-discuss] Can lustre be trusted to keep my data safe?

Joe Georger jgeorger at ll.mit.edu
Thu May 15 06:09:19 PDT 2008


I think you should be more worried about your hardware.  We have several 
Nexsan units including dual controller Satabeasts running Raid6.  A few 
months ago one of them had a "glitch" and it incorrectly marked 3 disks 
as bad within 6 seconds.  So it had started the rebuild process then 
stopped in an unsynchronized state.  We are running Ibrix (I read this 
list in case we ever want to switch) and it corrupted the file system.  
We lost data.  Even after spending 3 days running fsck on a 160 TB 
filesystem.  Fortunately Ibrix does not stripe files across OST's so the 
loss had minimal impact and we were able to restore the 4 TB from 
backup.   Maybe Lustre would handle this better, I'm not sure....

So the glitch was eventually traced to some bad ECC....  It seemed like 
a one-off failure, but the lesson was important.  You really need a 2nd 
copy if your data is critical.  I've also been told than Nexsan is more 
like a "Tier 2" storage vendor.  If the 2nd copy is not feasible, 
perhaps consider more expensive "Tier 1" like Compellant, etc.

Joe

jrs wrote:
> Greetings all,
>
> I just spoke with someone at a large computing company who
> has a close relationship with lustre/sun (a reseller, I guess).
> This person described lustre as being something that Sun
> "would not recommend for mission critical use."
>
> Can this be true?
>
> I work for a small/medium company that does image processing.
> We have about 700TB of data presently and might be at 2PB within
> the next couple of years.  Owing to the amount of data we don't
> make backups for most of it and trust raid 6 on our hardware raid
> boxes (nexsan Satabeast) to fail more slowly than we can replace
> disks.  Over the last couple of years we've had great luck and,
> I believe, have never lost data owing to a failure with this
> hardware (software or human error is another matter ;-).
> However, the unbacked up data is "mission critical."  Though
> it can, probably, all be reconstructed or reacquired, as a practical
> matter losing a significant quantity of this data could be
> catastrophic for our business.
>
> So, what do you think, can lustre be trusted to keep our
> data safe at our company?  Assume in answering that we have
> failover working properly.  We can also withstand some blocking
> of the filesystem while a failover event completes, i.e., not
> having the filesystem available for some amount of time is
> not a problem, but having directory important-data/ disappear
> is a HUGE problem.
>
> Thanks for any help or guidance,
>
> John
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>   



More information about the lustre-discuss mailing list