[Lustre-discuss] Question on lustre redundancy/failure features

Tue Jun 29 00:40:18 PDT 2010

> Lustre is not the solution I'm looking for. I was hoping that to use
> it as an easily expandable storage cluster

But it is that; it is just a particular type of storage cluster
system with a specific performance profile.

As to "expandable", consider again whether your requirements
involve a single storage pool or you can do multiple instances.

> with the equivalent of network RAID 5 across 3 machines with
> RAID 1 physical disks.

That seems to me quite a peculiar setup with some strong
performance anisotropy and it is difficult for me to imagine the
requirements driving that.

> This storage cluster/SAN would then hold VM images for several
> VM servers.

The images can be relatively small things. What about the
storage for those VMs? Virtual disks (more images) or do you
mount the filesystems from a NAS server (e.g. Lustre) while the
VM is booting?

> This way, I thought it would make recovery of any machine
> easy, I just have to mount the network storage on a
> working/replacement server and boot up the VMs originally
> hosted on a failed server.

Ah that's an interesting point, as you have implicitly stated
some avalaibility requirements and expected failure modes.
Apparently you don't need continuous VM availability and
recovery can be manual and take some time. Also you think that
loss of a compute server is more likely or easier to recover
from than loss of a storage server or a storage device (even if
you want to provide two levels of redundancy). You also seem to
imply that network latency and bandwidth is not a big issue as
to VM performance.

> Somebody else pointed out that I might be looking for
> OpenFiler instead.

Or perhaps GlusterFS. Or perhaps check again your requirements
and simplify a bit your design.

The ideal application for Lustre is massively parallel
(many-to-many) IO of large sequentially accessed datasets, and
down from there.  Scalability is bought at the price of network
latency and traffic (in this it is a smaller scale version of
the GoogleFS, where the tradeoff is even more extreme), and
careful design of the underlying storage layer (in this the
GoogleFS is the opposite).

It can also do decently the same workloads that