[Lustre-devel] global epochs [an alternative proposal, long and dry].

Alexander Zarochentsev Alexander.Zarochentsev at Sun.COM
Mon Dec 22 05:48:18 PST 2008


On 22 December 2008 15:45:51 Nikita Danilov wrote:
> Alex Zhuravlev writes:
>  > Hello,
>
>  > I'm not sure it scales well as any failed node may cause global
>  > stuck in undo/redo pruning.
>
> Only until this node is evicted, and I think that no matter what is
> the pattern of failures, a single level of `tree reduction', can be
> delayed by no more than a single eviction timeout.

It introduces unneeded dependency between nodes, any node cannot prune 
its own undo logs if all nodes have an agreement that the epoch can be 
pruned. IMO it is what scalable system should avoid. 

If we would have a disaster in a part of the cluster, client nodes would 
disconnect and reconnect often, the undo logs will be overloaded, and 
the cluster will stop, no?

Thanks,
-- 
Alexander "Zam" Zarochentsev
Staff Engineer
Lustre Group, Sun Microsystems



More information about the lustre-devel mailing list