In this post, I’m going to answer a seemingly simple question: Why do neither Debian or Fedora have a well defined and reliable process to rebuild everything from source?
I’m often surprised by how many people I encounter in the FOSS world, even experienced developers, that are either intimidated by the idea of building “everything” from source, or think it’s crazy, or just not worth it. Let’s just assume for the purposes of this discussion that rebuilding from source is valuable. I mean, after all it’s Free Software, not Free Binaries Wrapped With Some Metadata.
First, let me define “everything”: the goal here is to construct a basic Linux-based system, bootable in qemu, and you can log in as root. A perfect example of our goal here is to build the source code that comprises JS/Linux. That means the kernel, bash, glibc, gcc, etc. If you read the tech notes, you’ll see it’s built using Buildroot.
The first thing to observe here is not only do multiple projects to accomplish this goal exist, they do so with a high degree of reliably, and solve real-world needs. For example, the Yocto project’s “core-image-minimal” target gets you basically this same thing, and you just run bitbake core-image-minimal, and everything else is done for you. Likewise, a quick read of the Buildroot manual will show you just how little needs configuration or manual intervention.
The second thing to note about these systems like Buildroot, Yocto, and others – they are not (by default), self hosting. The host and target systems need not be the same. For example, you can use Yocto to build “core-image-minimal” from a Red Hat Enterprise Linux 6 system, an Ubuntu 12.04 system, and a variety of others. In fact, you can even do full cross builds from x86_64 to ARM. Now, interestingly Yocto can generate self-hosting systems, but it’s not the default.
We’re getting closer to answering our original question. Let’s further observe that both Debian and Fedora are defined to be self hosting systems. Why is self hosting a problem? It’s because of circular build dependencies. The classic example of this is gcc, which is written in the C programming language. In order to build it, you need a C compiler already. The Yocto/Buildroot type build systems get out of this problem in a simple way – they assume you already have a functioning gcc on the host system.
But in Debian and Fedora, in order to build the gcc package, you need gcc already built as a package – the build system won’t accept just having a “gcc” binary in the $PATH. That’s how the build systems work because again, that’s how the projects are defined.
If you haven’t done this recently, grab a mirror you can hold in your hand, and go into your bathroom, and point the hand mirror at the wall mirror. You’ll get an infinite recursion. It’s really quite beautiful and fun to do, but since I’m sure many of you won’t, there’s a good picture here.
This infinite recursion resulting from self-hosting is the reason there isn’t one reliable command to rebuild all of Debian or Fedora from source.
One question you might have – would it make sense to have a well-defined process for bootstrapping a self-hosting system like Debian? Some of the developers think so, and the DebianBootstrap wiki page describes the thoughts so far. Personally though, I think it’s both too complex and too vague. A much simpler, and ultimately more reliable goal would be to ensure that version N of the system can be built by N-1. So Fedora 17 can be built on a Fedora 16 system, Debian Wheezy can be built from Squeeze, Red Hat Enterprise Linux 6 can be built from Red Hat Entrerprise Linux 5, etc. Eventually this is a goal I’d like to achieve for Red Hat Enterprise Linux at least. There’d be some cost to packages with circular build dependencies, but having a well defined, reliable process for building from source: priceless.
I’m not technical enough to comment in detail but Baserock *is* self hosting by default, builds everything from source, and has been designed to build new versions of itself easily – my demo in Shanghai this week proved exactly that.
To get to this point we had to make the whole bootstrap of Baserock itself automatic and repeatable. I hope Lars or Daniel will write soon to provide more thorough explanation.
Yeah, I guess it’d be fair to say Baserock has a higher emphasis on self-hosting; since it eschews cross builds. But it’s still a goal to be able to bootstrap from a variety of target hosts (Debian, RHEL, etc.) right?
If that is the case, out of curiosity, what host systems do you guys test the builds from?
With Baserock, bootstrapping serves two purposes: First of all, it is a means to obtaining a system that can actually self-host itself. Secondly, since we don’t cross-compile, it is currently the only way to port Baserock to new architectures. Once that is achieved, bootstrapping only plays a minor role. One aspect that makes Baserock very different from other systems such as Yocto is that people are expected to use Baserock as the host system to build Baserock systems.
While we make sure to make bootstrapping reproducible by testing it whenever we make changes to anything that affects it, allowing it be performed on all different kinds of host systems (Debian, Fedora etc.) is less important. We currently only support bootstrapping Baserock from Debian, which seems like the logical choice due to the availability of x86 and ARM ports. Consequently, Debian is also what we keep test bootstrapping on.
Reiterating what I wrote in the beginning: one of the ideas behind Baserock is that it is host and target system at the same time. For bootstrapping it is not so important to support different distributions as it is to support different (relevant) architectures.
(This describes my personal perspective by the way.)
Actually, bootstrapping from multiple hosts isn’t a goal for Baserock. We bootstrapped from Debian, but the developing, building and testing now happens in Baserock.
Ah, interesting, OK. Baserock is sort of hard to categorize in this post, so I just ommitted it for now; but if you have any comments on where its strengths/weaknesses fit in here that’d be interesting.
How do you do choose where to break build cycles? Just hardcoded?
Our bootstrapping process is pretty much the only place where we try and break cycles. Sometimes things are more “interesting” because of the issues of building from upstream’s revision control where we can. However we have the benefit that by not trying to be a full “trad.” distribution we have fewer build dependency cycles to contend with. For example we care very little about the construction of manpages, info files etc and where necessary we prevent the build of them in order to reduce/break cycles.
I shall have a ponder about the rest of your article over a cuppa later.
You should definitely take a look at NixOS/Nix – a distribution/package manager which is purely functional, supports building from source and everything which is required to build a full system is just a single declaration file.
See http://nixos.org/ for more details.
Nix is a self-hosting system. They have the same bootstrapping issues as Debian/Fedora do. See for example:
http://comments.gmane.org/gmane.linux.distributions.nixos/8566
Remember, the assumption in my blog post is that you *don’t* have the binaries already built. So I probably should have said “building *only* from source”, and not “rebuilding”.
In my experience, I found the biggest problem was that the options that need to be passed to ./configure are non-trivial. Especially glibc and binutils.
It definitely would help if all source code could easily be available in a /src directory, so that I could easily tweak individual libs.
DDD
It basically doesn’t make sense to hand-roll a build system – there are quite a number to choose from with different tradeoffs.