[Lustre-discuss] Lustre 1.6.3 - where are the bug fixes?

Niklas Edmundsson Niklas.Edmundsson at hpc2n.umu.se
Thu Oct 11 23:53:45 PDT 2007


OK, I know that there is supposedly some QA before lustre releases and 
that it might be the reason for fixes taking such a long time to 
propagate, but still: It takes too long for fixes to end up in a 
released version...

During our rather limited testing on Ubuntu Dapper (using the Debian 
2.6.18 kernel on servers and pkg-lustre packaging) we've run into 
a couple of bugs, most of them with the typical "fix in bugzilla".

The pkg-lustre packaging has six fixes from bugzilla applied, they 
seem to have munged the bug numbers but it seems that only three of 
them are in the 1.6.3 changelog.

We have locally applied fixes from bug 13438 (lustre is totally 
useless without it due to servers OOPS:ing) and 13614. None of them 
seems to be in the 1.6.3 changelog.

So, I'd suggest that CFS gets their act together and starts releasing 
versions more often, if they'd done this during 1.6 development we 
wouldn't be installing production releases that you can crash after a 
day of testing now.

If QA is the argument for not doing releases more often, consider the 
fact that known broken releases that you have to patch yourself with 
patches hidden in bugzilla isn't much better.

In reality, I think that doing non-QA'd snapshot releases might be the 
way to go. That is, releases with the useful more-or-less trivial 
fixes that avoids crashes etc. and that will be included in the next 
QA'd release. They would not be suitable for production, but at least 
you can rather easily download the latest snapshot and try on your 
test cluster and see if it fixes the problem(s) you've encountered. 
And if it does, we can bug CFS until they get their act together and 
gets a release out with the fix included.

In the end, you have to realise that when you have a production system 
you don't want to wait for weeks and months for a new release that 
might fix a crash-inducing bug you're hitting. I say might here, 
because obviously having a fix hidden in bugzilla is no guarantee that 
it's included in a released version.

In our case we're not at production yet because of these problems with 
getting fixes out quickly enough. So far we've always been able to 
crash lustre 1.6 within days, and that's after waiting for 1.6 for 
well over a year.

So, I'd like to challenge CFS to get a version of lustre 1.6 (or 1.8, 
whatever) out that proves stable on our small lustre test setup. 
Without patches. In the year of 2007.

Since the "internal QA only" approach obviously isn't working, I'd 
suggest that you embrace "release early, release often" to get there. 
That means one release per week as long as you have fixes pending to 
get a decent churn on things.


/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke at hpc2n.umu.se
---------------------------------------------------------------------------
  Short cut... the longest distance between two points.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=




More information about the lustre-discuss mailing list