[Lustre-discuss] one server node fails, its all dead?

Craig Tierney Craig.Tierney at noaa.gov
Mon Feb 2 13:52:56 PST 2009


Robert Minvielle wrote:
> More testing today. I downed a server (OST) to see what would happen. Well,
> it does follow the FAQ :) The FAQ states:
> 
> -- begin FAQ --
> I don't need failover, and don't want shared storage. How will this work?
> 
> If Lustre is configured without shared storage for failover, and a server node fails, then a client that tries to use that node will pause until the failed server is returned to operation. After a short delay (a configurable timeout value), applications waiting for those nodes can be aborted with a signal (kill or Ctrl-C), similar to the NFS soft-mount mode.
> 
> When the node is returned to service, applications which have not been aborted will continue to run without errors or data loss.
> 
> -- end FAQ --
> 
> So, if I have a server that goes down, the clients are out of luck. I have
> a hard time believing this is "acceptable". Ok, so it is "as good as" NFS,
> but I mean really, if a single storage unit fails all of my clients can do
> nothing? Am I missing something here or is this by design? 

It says that if a server node fails, then any client trying to access that
server node will block until its function is restored.  If you don't want
failover nor shared storage, what is Lustre supposed to do?  Other clients
can write data to other nodes, and they can read data if it is on a working
server node.

The real reason
> I ask is that I am testing Lustre against a few other DPFS to see if we will
> move to Lustre. So far, some things are nice, and some are not nice. Writing
> seems to be faster, but reading is slower (than my other test DPFSs). 
> Contacting Sun to ask about support took forever. At least four days for them
> to just call me back and tell me they could not give me a price without 
> knowing how much storage I have (ugh, a pay per byte system, great). 
> 

Performance is going to very greatly with the storage hardware used.  Sun
can help there (if you can find the right people to find).

> So, Lustre users, is it worth it? My setup would be 24 OST's with about
> 100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so clients
> needing pretty fast read/write, connected via 10G ethernet (yes, I know I 
> need a SAN but the physical locations will not allow it and the price is
> prohibitive, hence my looking at DPFSs)... Am I on the right track looking
> at Lustre, or should I go elsewhere? I also need commercial support of some
> kind (although it seems Sun is unsure of themselves here, they did not 
> know who to contact when I contacted them "Lustre, we make a product
> called Lustre? Hold please"... 

Other companies can provide Lustre support.  Data Direct Networks provides
a packaged solution with their hardware.  It can be configured with failover
and all the other goodies one would want to ensure maximum uptime.  Terascala
provides Lustre support with their hardware.  Although they are working
(or may have delivered) failover configurations, their initial releases were
non-shared storage, no-failover configurations.

Craig



> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the lustre-discuss mailing list