[Lustre-discuss] failure rates
John White
jwhite at lbl.gov
Fri Apr 24 10:11:30 PDT 2009
On Apr 24, 2009, at 9:59 AM, Brian J. Murrell wrote:
> On Fri, 2009-04-24 at 09:48 -0700, John White wrote:
>>
>> I wonder if anyone has any failure metrics on their specific
>> installations. We're quite new to the lustre space and wanted to get
>> a feel for what we might be in for downtime-wise. In particular,
>> does
>> anyone have numbers for the mean time between failure and mean time
>> to
>> repair?
>
> I think this is a very subjective question. To a great deal it's
> going
> to depend on how much you spend on your infrastructure. If you buy
> cheap(ly built) hardware, it will most likely fail more often than
> better built hardware.
Oh, naturally. I suppose I was short on details. The question is
more geared at the software side of things. Of course you can build
in hardware redundancy on the back-end, set up failover on the server-
end, etc. Beyond those, I'm curious how often software unavoidably
"flips-out" under lustre and how long these commonly take to recover
from. Say the lock manager tweaks, etc.
I know this is a rather difficult metric to quantify, especially after
experiences with.. other.. parallel filesystems. Perhaps people have
numbers for their specific configuration?
>
>
> Additionally, given Lustre's HA abilities, uptime is something you can
> throw money at (or not). If you have a high amount of redundancy in
> your architecture, including failover pairs and so on, then downtime
> is
> reduced as your redundant hardware kicks in to provide uptime where it
> would have not been had you not spent on and built that redundant
> architecture.
>
> There are probably lots of places where the same kind of arguments can
> be made, making the question all that more subjective.
>
> b.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list