[Lustre-discuss] Problems with failover

Thu Jan 3 19:59:15 PST 2008

> You currently need another mechanism (hardware or software RAID) to
> provide data redundancy in case of disk failure.  We are working to
> provide data replication at the Lustre level, but that is not yet
> available.

I should say. That technology has me pretty excited. Right now, unless I
bend over backwards
and do something like "vertical" RAID stripe/mirrors across multiple disk
trays in a storage cluster,
I can end up with a very bad situation if I lose an entire tray. This can
have a potentially devastating
impact on my entire storage tier.

A few companies here and there (XIV, Isilon) are starting to abandon
hardware raid and are doing
block replication across the entire storage cluster. With that, I can forget
worrying about specific
disks (except to replace them), and don't even have to worry about whole
trays (insofar as I have
spare capacity).

This is a pretty neat capability. If you add to it the ability to
"rebalance" your cluster on the fly as
new nodes are added, what you end up with is a self-healing storage cluster.
Pretty compelling
for those availability figures, and can help with the disk-service pattern
as well.

Joe Kraska
San Diego CA
USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080103/da4e35d4/attachment.htm>