[Lustre-discuss] Best way to recover an OST

Peter Grandi pg_lus at lus.for.sabi.co.UK
Sun May 23 06:53:56 PDT 2010


>> We encountered a multi-disk failure on one of our mdadm RAID6
>> 8+2 OSTs. 2 drives failed in the array within the space of a
>> couple of hours and were replaced.

There are many reports of multidrive failures, some pretty
impressive e.g. 10 out of 20 on a long-running array after a
restart. Because of common modes, that is not unexpected, as
failures are not uncorrelated (especially when rebuilding!).

> I guess the need for +3 parity is closer than we think...

Some people are pushing this, and I guess that you are thinking
about the arguments here:

  http://blogs.sun.com/ahl/entry/acm_triple_parity_raid

But I think it is simply stupid -- adding more parity makes
things slower and less reliable (e.g. more complexity),
especially if one takes "advantage" of the false sense of
security of more parity to have wider arrays. I'd rather have, in
the few cases where it makes sense, a narrower RAID5 than a wider
RAID6, for example (e.g. two 4+1 RAID5s instead of one 8+2 RAID6).

The usual arguments apply: http://WWW.BAARF.com/ plus that
"stupid" is usually rewarded by "management" who see the obvious
reduction in cost but don't see those in performance, simplicity
and reliability.

Note that one argument in the page above is "fills a niche", and
a slong it is acknowledged that is it s a minuscule niche it is
fine; but then "need for +3 parity" is a rather wider statement.

If an 8+2 array had 2 drive failures, perhaps instead of looking
at more parity it would be better to look at common modes of
failure; and not just vibration, heat or electrical common modes,
but also the thoroughly moronic practice of many RAID vendors
(e.g. EMC, DDN, NexSAN by my direct experience, but most/all do
that) to put into their arrays drives not only of the same
manufacturer and model, but even with nearly consecutive serial
numbers from the same delivery and even the same carton.

And in any case if one uses something like Lustre 1.x, which is a
parallel metafilesystem with no data redundancy (and for very
good reasons, and mirroring in 2.x is something that I have very
mixed feelings about), using parity RAID is doubly stupid, as the
storage layer has to provide all the redundancy.

And in any case one cannot do storage systems that never fail;
what matter more is what happens when they do fail. As to this
fortunately Lustre does pretty well.



More information about the lustre-discuss mailing list