[Lustre-discuss] Lustre 1.6.3 - where are the bug fixes?

Kevin Canady kevinc at clusterfs.com
Fri Oct 12 00:20:03 PDT 2007


Nikke,
Are you customer of Lustre Support? I don't have you listed as a supported
customer.  Maybe we should arrange a discussion about how we could assist
you more effectively.

Best regards,
Kevin 
-- 
P. Kevin Canady
Director, Business Development
Lustre Group (Formerly CFS)
Sun Microsystems, Inc.
O: 415.928.3633
C: 415.505.7701


On 10/11/07 11:53 PM, "Niklas Edmundsson" <Niklas.Edmundsson at hpc2n.umu.se>
wrote:

> 
> OK, I know that there is supposedly some QA before lustre releases and
> that it might be the reason for fixes taking such a long time to
> propagate, but still: It takes too long for fixes to end up in a
> released version...
> 
> During our rather limited testing on Ubuntu Dapper (using the Debian
> 2.6.18 kernel on servers and pkg-lustre packaging) we've run into
> a couple of bugs, most of them with the typical "fix in bugzilla".
> 
> The pkg-lustre packaging has six fixes from bugzilla applied, they
> seem to have munged the bug numbers but it seems that only three of
> them are in the 1.6.3 changelog.
> 
> We have locally applied fixes from bug 13438 (lustre is totally
> useless without it due to servers OOPS:ing) and 13614. None of them
> seems to be in the 1.6.3 changelog.
> 
> So, I'd suggest that CFS gets their act together and starts releasing
> versions more often, if they'd done this during 1.6 development we
> wouldn't be installing production releases that you can crash after a
> day of testing now.
> 
> If QA is the argument for not doing releases more often, consider the
> fact that known broken releases that you have to patch yourself with
> patches hidden in bugzilla isn't much better.
> 
> In reality, I think that doing non-QA'd snapshot releases might be the
> way to go. That is, releases with the useful more-or-less trivial
> fixes that avoids crashes etc. and that will be included in the next
> QA'd release. They would not be suitable for production, but at least
> you can rather easily download the latest snapshot and try on your
> test cluster and see if it fixes the problem(s) you've encountered.
> And if it does, we can bug CFS until they get their act together and
> gets a release out with the fix included.
> 
> In the end, you have to realise that when you have a production system
> you don't want to wait for weeks and months for a new release that
> might fix a crash-inducing bug you're hitting. I say might here,
> because obviously having a fix hidden in bugzilla is no guarantee that
> it's included in a released version.
> 
> In our case we're not at production yet because of these problems with
> getting fixes out quickly enough. So far we've always been able to
> crash lustre 1.6 within days, and that's after waiting for 1.6 for
> well over a year.
> 
> So, I'd like to challenge CFS to get a version of lustre 1.6 (or 1.8,
> whatever) out that proves stable on our small lustre test setup.
> Without patches. In the year of 2007.
> 
> Since the "internal QA only" approach obviously isn't working, I'd
> suggest that you embrace "release early, release often" to get there.
> That means one release per week as long as you have fixes pending to
> get a decent churn on things.
> 
> 
> /Nikke





More information about the lustre-discuss mailing list