[Lustre-discuss] Failover / reliability using SAD direct-attached storage

Thu Jul 21 14:42:47 PDT 2011

Apologies if this is a bit newbie, but I'm just getting started, really. I'm
still in design / testing stage and looking to wrap my head around a few
things.

I'm most familiar with Fibre Channel storage. As I understand it, you
configure a pair of OSS per OST, one actively serving it, the other
passively waiting in case the primary OSS fails. Please correct me if I'm
wrong...

With SAS/SATA direct-attached storage (DAS), though, it's a little less
clear to me. With SATA, I imagine that if an OSS goes down, all it's OSTs go
down with it (whether they be internal or external mounted drives), since
there is no multipathing. Also, I suppose I'd want a hardware RAID
controller PCIe card, which would also preclude failover since it's not
going to have cache and configuration mirrored in another OSS's RAID card.

With SAS, there seems to be a new way of doing this that I'm just starting
to learn about, but is a bit fuzzy still to me. I see that with things like
Storage Bridge Bay storage servers from the likes of Supermicro, there is a
method of putting two server motherboards in one enclosure, having an
internal 10GigE link between them to keep cache coherency, some sort of
software layer to manage that (?), and then you can use inexpensive SAS
drives internally and through external JBOD chassis. Is anyone using
something like this with Lustre?

Or perhaps I'm not seeing the forest through the trees and Lustre has
software features built-in that negate the need for this (such as parity of
objects at the server level, so you can loose N+1 OSS)? Bottom line, what
I'm after is figuring out what architecture works with inexpensive internal
and/or JBOD SAS storage that won't risk data loss with the failure of a
single drive or server RAID array...

Thanks,

Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110721/14f1606a/attachment.htm>