[Lustre-devel] global epochs [an alternative proposal, long and dry].
Alexander Zarochentsev
Alexander.Zarochentsev at Sun.COM
Mon Dec 22 05:48:18 PST 2008
On 22 December 2008 15:45:51 Nikita Danilov wrote:
> Alex Zhuravlev writes:
> > Hello,
>
> > I'm not sure it scales well as any failed node may cause global
> > stuck in undo/redo pruning.
>
> Only until this node is evicted, and I think that no matter what is
> the pattern of failures, a single level of `tree reduction', can be
> delayed by no more than a single eviction timeout.
It introduces unneeded dependency between nodes, any node cannot prune
its own undo logs if all nodes have an agreement that the epoch can be
pruned. IMO it is what scalable system should avoid.
If we would have a disaster in a part of the cluster, client nodes would
disconnect and reconnect often, the undo logs will be overloaded, and
the cluster will stop, no?
Thanks,
--
Alexander "Zam" Zarochentsev
Staff Engineer
Lustre Group, Sun Microsystems
More information about the lustre-devel
mailing list