[Lustre-discuss] Redundancy with object storage?
D. Dante Lorenso
dante at lorenso.com
Tue Dec 4 15:59:34 PST 2007
Brian J. Murrell wrote:
> On Mon, 2007-12-03 at 20:32 -0600, D. Dante Lorenso wrote:
>> The problem that I see is that if any one
>> piece of the 4-node system fails, the whole system will fail.
>
> Not quite. If an OST fails, only the objects on that OST become failed.
> The filesystem will continue to run and service requests from the
> available OSTs. That means, depending on striping policies, that some
> or all of a file's contents might not be available.
What happens when you try to read a file from the OST that is down? I'm
guessing that read will hang for a considerable period of time. Likely
that hanging will eventually occur for many files on a box
simultaneously and the whole box will lock up waiting on I/O it will
never get ... essentially taking the whole shebang down.
>> Is it possible to configure Lustre to write Objects to more than 1 node
>> simultaneously such that I am guaranteed that if one node goes down that
>> all files are still accessible?
> That's called RAID, and right now, no. It's on the roadmap though.
Is the road map posted somewhere? URL? Any timeline I might want to
watch and wait for?
>> This would effectively mean that I
>> would use 2 times the storage space for each object written and would
>> require that every cluster have a minimum of 2 nodes.
> This is a description of mirroring.
Right, like RAID 1, but at the network level.
>> I understand the concepts of using DRBD and replicating block devices as
>> well as creating a full separate cluster for fail-over, but I'm hoping
>> to build redundancy into a single cluster without having to duplicate my
>> network with a bunch of active/passive machine combinations.
>
> Using a reliable (some form of RAID -- which drbd qualifies as) shared
> storage device is the only way to mitigate the SPOF scenario with Lustre
> currently.
>
> As far as duplication, etc. there is no reason why you cannot mirror
> your drbd devices amongst the hardware you have currently (i.e pair your
> machines up and create drbd based, mirrored devices) and along with some
> form of failover (i.e. HA heartbeat) to get some redundancy.
>
> If you were already resigned to halving your storage by mirroring OSTs
> at the Lustre layer, dropping that mirroring down to drbd should not
> impose any more significant costs. Or maybe I'm misunderstanding your
> concerns with using drbd.
I have configured a DRBD system with heartbeat in my lab tests and it
seems to work well enough, but I haven't tied it into Lustre just yet.
I was concerned about the frailty of a system that requires all 3
(lustre, drbd, and heartbeat) to magically work in unison.
It is a delicate mounting/unmounting game to ensure that partitions are
monitored, mounted, and fail-over in just the right order. Eliminating
all the moving parts by using 1 solution like Lustre was what I was
hoping for.
I'm leaning toward doing the L,D,H solution, but was really hoping for
something easier. Are there any online howtos that demonstrate that
configuration?
-- Dante
More information about the lustre-discuss
mailing list