[Lustre-devel] global epochs [an alternative proposal, long and dry].

Alex Zhuravlev Alex.Zhuravlev at Sun.COM
Tue Dec 23 02:21:49 PST 2008


Nikita Danilov wrote:
> If we have no more than 1 reintegration in a given epoch on a given
> client, then the server that received an OP = (U(0), ..., U(N)) in epoch
> E from a client, can send to SC a message telling it that this client
> contains N volatile updates in epoch E, and whenever some server commits
> one of U's it sends to SC a message asking it to decrease a counter for
> this client. Most obvious implementation will batch these notification,
> i.e., when a server commits a transaction group it notifies SC about all
> changes in one message. I personally don't think that is the best
> approach.

essentially this is very similar to dependency-based recovery, but with
no it's advantages and with SC tracking all states and being single point
of failure. I think we need more scalable solution.

> Yes, and this mechanism (if it is correct at all) will guarantee that an
> epoch cannot depend on a future epoch.

again, it's not about dependency, it's about network overhead of global epochs.

>  > just to list my observations about global epochs:
>  >   * it's a problem to implement synchronous operations
>  >   * network overhead even with local-only changes depending on workload
>  >   * disk overhead even with local-only changes
>  >   * SC is a single point of failure with any topology as it's the only place to
>  >     find final minimum
>  >   * tree reduction isn't obvious thing because client can't report its minimum
>  >     to any node, instead tree is rather static thing and any change should be
>  >     done very carefully. otherwise it's very easy to lose minimum
> 
> Unfortunately, as far as I know, no other solution was described with a
> level of detail sufficient to compare. :-)

I could say the same about tree reduction, for example ;)

dependency-based recovery was discussed with many details I think. and benefits are
very clear, IMHO. as well as overall simplicity due to local implementation (compared
with implementation involving all nodes in a cluster).

thanks, Alex




More information about the lustre-devel mailing list