[Lustre-discuss] Question on lustre redundancy/failure features

Sat Jun 26 14:13:23 PDT 2010

I'm looking at using Lustre to implement a centralized storage for
several virtualized machines. The key consideration being reliability
and ease of increasing/replacing capacity.

However, I'm still quite confused and haven't read the manual fully
because I'm tripping on this: what exactly happens if a piece of
hardware fails?
Perhaps it's because I haven't yet tried to setup Lustre so the terms
used don't quite translate for me yet. So I'll appreciate some newbie
hand holding here :)

For example, if I have a simple 5 machine cluster, one MDS/MDTand one
failover MDS/MDT. Three OSS/OST machines with 4 drives each, for 2
sets of MD Raid 1 block devices and so total of 6 OST if I didn't
understand the term wrongly.

What happens if one of the OSS/OST dies, say motherboard failure?
Because the manual mentions data striping across multiple OST, it
sounds like either networked RAID 0 or RAID 5.

In the case of network RAID 0, a single machine failure means the
whole cluster is dead. It doesn't seem to make sense for Lustre to
fail in this manner. Where as if Lustre implements network RAID 5, the
cluster would continue to serve all data despite the dead machine.

Yet the manual warns that Lustre does not have redundancy and relies
entirely on some kind of hardware RAID being used. So it seems to
imply that the network RAID 0 is what's implemented.

This appears to be the case given the example in the manual of a
simple combined MGS/MDT with two OSS/OST which uses the same fsname
"temp" for the OSTs, which then combines the two 16MB OST into a
single 30MB block device mounted as /lustre on the client.

Does this then mean that if I want redundancy on the storage, I would
basically need to have a failover machine for every OSS/OST?

I'm also confused because the manual says an OST is a block device
such as /dev/sda1 but OSS can be configured to provide failover
services. But if the OSS machine which houses the OST dies, how would
another OSS take over anyway since it would not be able to access the
other set of data?

Or does that mean this functionality is only available if the OST in
the cluster are standalone SAN devices?