[Lustre-discuss] Lustre 1.6.3 - where are the bug fixes?
Niklas Edmundsson
Niklas.Edmundsson at hpc2n.umu.se
Thu Oct 11 23:53:45 PDT 2007
OK, I know that there is supposedly some QA before lustre releases and
that it might be the reason for fixes taking such a long time to
propagate, but still: It takes too long for fixes to end up in a
released version...
During our rather limited testing on Ubuntu Dapper (using the Debian
2.6.18 kernel on servers and pkg-lustre packaging) we've run into
a couple of bugs, most of them with the typical "fix in bugzilla".
The pkg-lustre packaging has six fixes from bugzilla applied, they
seem to have munged the bug numbers but it seems that only three of
them are in the 1.6.3 changelog.
We have locally applied fixes from bug 13438 (lustre is totally
useless without it due to servers OOPS:ing) and 13614. None of them
seems to be in the 1.6.3 changelog.
So, I'd suggest that CFS gets their act together and starts releasing
versions more often, if they'd done this during 1.6 development we
wouldn't be installing production releases that you can crash after a
day of testing now.
If QA is the argument for not doing releases more often, consider the
fact that known broken releases that you have to patch yourself with
patches hidden in bugzilla isn't much better.
In reality, I think that doing non-QA'd snapshot releases might be the
way to go. That is, releases with the useful more-or-less trivial
fixes that avoids crashes etc. and that will be included in the next
QA'd release. They would not be suitable for production, but at least
you can rather easily download the latest snapshot and try on your
test cluster and see if it fixes the problem(s) you've encountered.
And if it does, we can bug CFS until they get their act together and
gets a release out with the fix included.
In the end, you have to realise that when you have a production system
you don't want to wait for weeks and months for a new release that
might fix a crash-inducing bug you're hitting. I say might here,
because obviously having a fix hidden in bugzilla is no guarantee that
it's included in a released version.
In our case we're not at production yet because of these problems with
getting fixes out quickly enough. So far we've always been able to
crash lustre 1.6 within days, and that's after waiting for 1.6 for
well over a year.
So, I'd like to challenge CFS to get a version of lustre 1.6 (or 1.8,
whatever) out that proves stable on our small lustre test setup.
Without patches. In the year of 2007.
Since the "internal QA only" approach obviously isn't working, I'd
suggest that you embrace "release early, release often" to get there.
That means one release per week as long as you have fixes pending to
get a decent churn on things.
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se
---------------------------------------------------------------------------
Short cut... the longest distance between two points.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
More information about the lustre-discuss
mailing list