[Lustre-discuss] one server node fails, its all dead?

Andreas Dilger adilger at sun.com
Mon Feb 2 13:57:57 PST 2009


On Feb 02, 2009  15:23 -0600, Robert Minvielle wrote:
> So, if I have a server that goes down, the clients are out of luck. I have
> a hard time believing this is "acceptable". Ok, so it is "as good as" NFS,
> but I mean really, if a single storage unit fails all of my clients can do
> nothing? Am I missing something here or is this by design?

It is possible for clients to create new files while a server is down,
but as you can expect it isn't possible to read any data from the failed
server.  In some cases users have used DRBD to do device replication
instead of using shared storage.

> The real reason
> I ask is that I am testing Lustre against a few other DPFS to see if we will
> move to Lustre. So far, some things are nice, and some are not nice. Writing
> seems to be faster, but reading is slower (than my other test DPFSs). 
> Contacting Sun to ask about support took forever. At least four days for them
> to just call me back and tell me they could not give me a price without 
> knowing how much storage I have (ugh, a pay per byte system, great). 

You can imagine that supporting the largest Lustre filesystem (1300+ OSTs
with 10PB of storage and 30k+ clients) will take more effort than supporting
a system with a handful of OSTs and clients.  The support price is not per
client, but rather per-OST, IIRC.

> So, Lustre users, is it worth it? My setup would be 24 OST's with about
> 100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so clients
> needing pretty fast read/write, connected via 10G ethernet (yes, I know I 
> need a SAN but the physical locations will not allow it and the price is
> prohibitive, hence my looking at DPFSs)... Am I on the right track looking
> at Lustre, or should I go elsewhere? I also need commercial support of some
> kind (although it seems Sun is unsure of themselves here, they did not 
> know who to contact when I contacted them "Lustre, we make a product
> called Lustre? Hold please"... 

Well, Sun is a big company, and Lustre was only acquired a year ago and
does not necessarily generate a high call volume to the L1 support people,
so they are not necessarily going to have information immediately handy.

Note that Lustre itself does NOT need a SAN to work, unlike some other
cluster filesystems.  The only SAN requirement is for failover pairs of
servers.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list