[Lustre-discuss] NFS vs Lustre

Mon Aug 31 23:27:39 PDT 2009

Hi!

On Mon, Aug 31, 2009 at 04:34:58PM -0400, Brian J. Murrell wrote:
> On Mon, 2009-08-31 at 21:56 +0200, Daniel Kobras wrote:
> > Lustre's
> > standard config follows Posix and allows dirty client-side caches after
> > close(). Performance improves as a result, of course, but in case something
> > goes wrong on the net or the server, users potentially lose data just like on
> > any local Posix filesystem.
> 
> I don't think this is true.  This is something that I am only
> peripherally knowledgeable about and I am sure somebody like Andreas or
> Johann can correct me if/where I go wrong...
> 
> You are right that there is an opportunity for a client to write to an
> OST and get it's write(2) call returned before data goes to physical
> disk.  But Lustre clients know that, and therefore they keep the state
> needed to replay that write(2) to the server until the server sends back
> a commit callback.  The commit callback is what tells the client that
> the data actually went to physical media and that it can now purge any
> state required to replay that transaction.

Lustre can recover from certain error conditions just fine, of course, but
still it cannot recover gracefully from others. Think double failures or, more
likely, connectivity problems to a subset of hosts. For instance, if, say, an
Ethernet switch goes down for a few minutes with IB still available, all
Ethernet-connected clients will get evicted. Users won't necessarily notice
that there was a problem, but they've just potentially lost data. VBR makes the
data loss less likely in this case, but the possibility is still there. I'd
suspect you'll always be able to construct similar corner cases as long as the
networked filesystem allows dirty caches after close().

Regards,

Daniel.