[Lustre-discuss] Question on lustre redundancy/failure features

Tue Jun 29 01:25:38 PDT 2010

>>> I'm looking at using Lustre to implement a centralized
>>> storage for several virtualized machines.

>> That's such a cliche, and Lustre is very suitable for it if
>> you don't mind network latency :-), or if you use a very low
>> latency fabric.

>> [ ... ] choose to provide resource virtualization and
>> parallelization at the lower levels of abstaction (e.g. block
>> device) and not at the higher ones (service protocol), thus
>> enjoying all the "benefits" of centralization. But then
>> probably they don't care about availability and in particular
>> abut latency (and sometimes not even about throughput).

> Am I correct to understand that you mean the approach I am
> considering is stupid then?

Not quite, there are some legitimate applications in which
availability, latency or throughput don't matter much or less
than other goals, and then low level virtualization and
parallelization are acceptable design choices.

But it is difficult for me to imagine the requirements that
justify a choice of network RAID5 on RAID1 arrays.

> [ ... ] pointers in the right direction :)

It depends on requirements, and what is the priority, and the
budget for the hardware layer.

> What do you mean by higher levels of abstraction and benefits
> of centralization?

Well, consider the case of something like a data repository,
e.g. RDBMS tablespaces or local mail store.

The alternative could be between virtualizing and sharing the
disks using a low level block oriented protocol (e.g. GFS/GFS2)
or having two redundant RDBMS or mail storage systems each with
its own local storage and applications specific sync, that is
whether the virtualize the storage used by the service, or the
service. I think the latteris preferable in most cases. Another
popular choice to have a central SAN server, a central NAS (NFS,
Lustre, ...) server using it, and a central compute or
timesharing server mounting the latter, instead of three
computers each with local storage and filesystem and each
serving a third of the load.

Network latency and througput limitations usually matter more
than realtime continuous sharing and availability and unless one
wants to invest in HPC style fabrics, network latency and
throughput issues are best avoided and local access at low
levels of abstractions/virtualization is vastly preferable.

  Note: there are some people who do need massive shared systems
  with very high continuous realtime sharing and availability
  requirements, and there are very expensive and difficult ways
  to address those requirements properly.

> Would it be correct to understand that to mean instead of
> trying to provide redundant storage, I should be looking at
> providing several servers that would simply fail over to each
> other? e.g.
> S1 (VM1, VM2, VM3) failover to S2
> S2 (VM4, VM5, VM6) failover to S3
> S3 (VM7, VM8, VM9) failover to S1

I presume that this means that S1 is running VM1, VM2, VM3 from
local disks.

This might be a good alternative, and you could be using DRBD to
mirror the images in realtime across machines. The advantage
would be a lot less network latency (with the "main" image being
on local storage and only writes, and queued, to the network)
and network traffic (all reads being local).

Another issue is whether you have different requirements for the
VM images (e.g. the '/' filesystem) and/or the filesystems they
access (e.g. '/home' or '/var/www'), and whether the latter
should be shared across two or more VMs. In which case a network
filesystem could be handy, and Lustre is a good choice even if
one does not need its massively parallel (many-to-many) streaming
performance.

  Note: for VMs regrettably block level virtualization over the
  network might be better than mounting filesystems over the
  network, because in the former case the network traffic is
  done by the real system, in the latter by the virtual system,
  and many VM implementations don't do network traffic that well.