[Lustre-discuss] Regarding redundancy

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Apr 7 08:43:19 PDT 2009


On Tue, 2009-04-07 at 08:34 -0700, Jim Garlick wrote:
> 
> Discarding all transactions

Only transactions subsequent to a missing transaction.

>  causes a lot of collateral damage in a
> multi-cluster, mixed parallel job environment where "file-per-process"
> style I/O predominates.

Indeed, depending where the AWOL client's transaction sits in the replay
stream.  So if it was the last transaction, the loss is absolutely
minimal but if it was the first transaction, the loss is absolutely
maximal.

> Could somebody remind me of the use cases protected by this behavior?

Simply transactional dependency.

If you don't know what the AWOL client did to a given file, you cannot
reliably process any further updates to that file, and if you don't have
the AWOL client to ask what files it has transactions for, everything
subsequent to that client's transaction has to be suspect.  While I
don't have any examples off-hand, I am sure one of the devs that
constantly have their fingers in replay can cite many actual scenarios
where this is a problem.

> In the case of I/O to a shared file, aren't lustre's errror handling
> obligations met by evicting the single offending client?

No.  All clients subsequently have to be evicted, per the above.

> Perhaps I am
> thinking too provincially because in our environment, I/O to shared
> files generally (always?) takes place in the context of a parallel job,
> and the single client eviction and EIO (or reboot of client) should
> be sufficient to terminate the whole job with an error.

Yours is probably a scenario where VBR will do really well then given
that VBR only serializes replay on truly dependent transactions rather
than the single serial stream (of assumed dependent transactions) that
replay currently operates with.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090407/58e0c28d/attachment.pgp>


More information about the lustre-discuss mailing list