[Lustre-devel] global epochs [an alternative proposal, long and dry].
Nikita Danilov
Nikita.Danilov at Sun.COM
Tue Dec 23 05:24:05 PST 2008
Alex Zhuravlev writes:
> Nikita Danilov wrote:
> > The question was about SC being the single point of failure. This can be
> > eliminated by replicating stability messages to a few nodes.
>
> more complexity to workaround initial problem?
More optional optimizations that are easy to implement later should they
prove necessary.
>
> > But "works" always means at least "meet requirements". There is no such
> > thing as efficient (or scalable), but incorrect program. Ordinary Lustre
> > recovery was implemented years ago and it is still has problems. I bet
> > it looked very easy in the beginning, so it was tempting to optimize it.
>
> then we can just proceed with synchronous IO if scalability isn't a requirement.
> and BKL is much better because of simplicity.
Precisely. If Linus decided to do an initial Linux SMP implementation
based on a fine grained locking the Linux kernel would have been
as... some other Free Beautifully Scalable kernel with a Daemon (slow,
un-scalable, and buggy). :-)
> > Suppose that transaction groups with these updates commit at the same
> > time and servers are ready to send information to each other. What
> > information each server sends and where?
>
> I'll prepare a detailed description in a separate mail.
Thanks.
>
> > > yes, it's well written and proven thing. the point is different - if it's clear that
> > > in some cases it doesn't work well (see sync requirement), what the proof does?
> >
> > It assures you that it _works_. Maybe sub-optimally, but it does. The
> > program that is lighting fast, consumes zero memory and scales across
> > the galaxy is useless if it is incorrect.
>
> interesting point. sounds like it's absolutely impossible to prove (somehow)
> another approach. having something "proved" doesn't mean we shouldn't try
> another approach to avoid sub-optimal but important cases?
We definitely should try, but I think much much more formal and rigorous
treatment than we are accustomed to is necessary for such fundamental
thing as recovery.
>
>
> thanks, Alex
Nikita.
More information about the lustre-devel
mailing list