[Lustre-discuss] one server node fails, its all dead?

Robert Minvielle robert at lite3d.com
Mon Feb 2 13:23:46 PST 2009


More testing today. I downed a server (OST) to see what would happen. Well,
it does follow the FAQ :) The FAQ states:

-- begin FAQ --
I don't need failover, and don't want shared storage. How will this work?

If Lustre is configured without shared storage for failover, and a server node fails, then a client that tries to use that node will pause until the failed server is returned to operation. After a short delay (a configurable timeout value), applications waiting for those nodes can be aborted with a signal (kill or Ctrl-C), similar to the NFS soft-mount mode.

When the node is returned to service, applications which have not been aborted will continue to run without errors or data loss.

-- end FAQ --

So, if I have a server that goes down, the clients are out of luck. I have
a hard time believing this is "acceptable". Ok, so it is "as good as" NFS,
but I mean really, if a single storage unit fails all of my clients can do
nothing? Am I missing something here or is this by design? The real reason
I ask is that I am testing Lustre against a few other DPFS to see if we will
move to Lustre. So far, some things are nice, and some are not nice. Writing
seems to be faster, but reading is slower (than my other test DPFSs). 
Contacting Sun to ask about support took forever. At least four days for them
to just call me back and tell me they could not give me a price without 
knowing how much storage I have (ugh, a pay per byte system, great). 

So, Lustre users, is it worth it? My setup would be 24 OST's with about
100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so clients
needing pretty fast read/write, connected via 10G ethernet (yes, I know I 
need a SAN but the physical locations will not allow it and the price is
prohibitive, hence my looking at DPFSs)... Am I on the right track looking
at Lustre, or should I go elsewhere? I also need commercial support of some
kind (although it seems Sun is unsure of themselves here, they did not 
know who to contact when I contacted them "Lustre, we make a product
called Lustre? Hold please"... 



More information about the lustre-discuss mailing list