[Lustre-devel] Technical debt in the lustre build system

Christopher J. Morrone morrone2 at llnl.gov
Fri May 6 14:53:31 PDT 2011


Eric Barton has been raising awareness about the need to address 
technical debt in the lustre code base.  I think we should also start 
talking about the technical debt in the lustre build system.

Overtime, we've cobbled together a very complex, and very fragile build 
system for lustre.  Every time I work on the build system, my 
frustration level builds and I am tempted to pull the whole thing apart 
and start from scratch.  But when I am thinking more rationally, I admit 
that a more evolutionary approach to improving the build system would be 
more likely to succeed.

So I'd like to start a discussion about where we need to go with the 
build system.  Here are some of the things off the top of my head that 
are problems that need to be addressed, or improvements that I think we 
should make.

1) The recursive configure system should be removed.

Each system that requires its own build system should be a standalone 
package.  Each standalone package should have proper Requires:, 
BuildRequires:, Provides:, etc. in the .spec file for rpm.  Appropriate 
equivalents should be used in other packaging systems (deb).

ldiskfs is a good candidate for this.  With the changes to support 
multiple backend filesystems, making the backends separate packages 
makes even more sense than it did in the past.  In fact, LLNL has 
already packaged ldiskfs separately for 2.1.  It would be great if the 
rest of the community adopted this approach in a future release.

I am guessing that the snmp directory could easily be its own package as 
well.

Lets identify more things like that.

2) Installed files need serious cleanup and reorganization.  Case in 
point, the main lustre package installs this file:

   /usr/bin/config.sh

This pretty much wins the lifetime award for Poorly Named Command In A 
Standard Path Location.  There are many others such as obdfilter-survey, 
ost-survey, parse-ior, plot-obdfilter, etc. that are clearly useful 
testing tools, but inappropriate for the main lustre rpm package.

3) Remove old build system tools dealing with CVS or Subversion 
repositories.  We've moved to git, and it is clearly superior.  We are 
not going back.  It is time to remove the cruft.

4) make_META.pl -> version_tag.pl.  Why is make_META.pl part of the 
build system and just a symlink to version_tag.pl?  I don't understand 
the rationale on this one.  Mighty confusing when you need to fix a bug 
in make_META.pl, but no file named make_META.pl exists in your source tree.

5) Need to keep in mind that third parties will be building this, and 
will need the flexibility to have their own tags and versioning schemes. 
  We can partly do this now, but it needs improvement.

Some of the code to check git version numbers and tags and such seems 
like it was well intentioned, but just adds too much complexity to an 
already complex problem.  Lets look into ways to simplify this.

6) The lustre.spec file.

Lets face it, rpm's spec language is just awful.  But it is what we are 
stuck with for most of our platforms, so we need to figure out how to 
live with it.  Lustre's spec file is a bit of a mess now, and pretty 
difficult for those of us downstream to use unmodified.  Some of the 
previous suggestions will naturally improve the state of the spec file, 
but additional improvements are needed.  I think we should take another 
look at the decision to parse --with-linux and --with-linux-objs out of 
%configure_args.  It just makes the interactions between various rpm 
variables and configure arguments too complex, in my opinion.

I think that we can take some inspiration here from Brian Behlendorf's 
zfs-modules.spec.in file in his ZFS repo:

   https://github.com/behlendorf/zfs

Brian has gone to great lengths to make ZFS buildable under just about 
every Linux distro under the sun, and I still am able to understand his 
spec file.  I can't say the same for Lustre's spec file, and lustre 
doesn't build nearly as cleanly.

Grantly, lustre is a bit more complex in ways...but by splitting the 
code into multiple projects I think we can reduce the spec file complexity.

7) build/lbuild-*

What is this stuff?  Does anyone outside of the core CFS/Sun/Oracle/etc. 
team use this?  Seriously, if you do, please speak up.

I know that LLNL has never used it.  Frankly, I think it should be 
removed from the main Lustre tree.  My impression, from a brief skimming 
of the files, is that they are the automated build system that upstream 
has used to generate kernel packages, lustre packages, and maybe IB 
packages.

LLNL uses an automated build environment based on buildbot that builds 
lustre and all of our other packages under a chroot environment 
individually created for each package by "mock".  It contains only the 
rpms needed by the package, which enforces that we have to have our spec 
file dependencies correct (another reason why the lustre.spec often 
doesn't work for us).

That is a bit of a digression, but my point is this: we probably all 
have our own build systems to contend with.  Those scripts shouldn't be 
part of the main lustre tree.  They should be a separate package, or 
just Whamcloud's internal scripts if no one else is using them.

8) Lustre .src.rpm should be rebuildable.  It is now, more-or-less, but 
could use improvement.

So where do we go from here?  I think we should set up a wiki page to 
plan the overhaul, and start opening bugs to track individual changes 
that need to be made.

Make a large overhaul for 2.1 is out of the question, but perhaps we can 
make many of the changes in the next release.

Chris




More information about the lustre-devel mailing list