Building everything from source vs self-hosting

In this post, I’m going to answer a seemingly simple question: Why do neither Debian or Fedora have a well defined and reliable process to rebuild everything from source?

I’m often surprised by how many people I encounter in the FOSS world, even experienced developers, that are either intimidated by the idea of building “everything” from source, or think it’s crazy, or just not worth it. Let’s just assume for the purposes of this discussion that rebuilding from source is valuable. I mean, after all it’s Free Software, not Free Binaries Wrapped With Some Metadata.

First, let me define “everything”: the goal here is to construct a basic Linux-based system, bootable in qemu, and you can log in as root. A perfect example of our goal here is to build the source code that comprises JS/Linux. That means the kernel, bash, glibc, gcc, etc. If you read the tech notes, you’ll see it’s built using Buildroot.

The first thing to observe here is not only do multiple projects to accomplish this goal exist, they do so with a high degree of reliably, and solve real-world needs. For example, the Yocto project’s “core-image-minimal” target gets you basically this same thing, and you just run bitbake core-image-minimal, and everything else is done for you. Likewise, a quick read of the Buildroot manual will show you just how little needs configuration or manual intervention.

The second thing to note about these systems like Buildroot, Yocto, and others – they are not (by default), self hosting. The host and target systems need not be the same. For example, you can use Yocto to build “core-image-minimal” from a Red Hat Enterprise Linux 6 system, an Ubuntu 12.04 system, and a variety of others. In fact, you can even do full cross builds from x86_64 to ARM. Now, interestingly Yocto can generate self-hosting systems, but it’s not the default.

We’re getting closer to answering our original question. Let’s further observe that both Debian and Fedora are defined to be self hosting systems. Why is self hosting a problem? It’s because of circular build dependencies. The classic example of this is gcc, which is written in the C programming language. In order to build it, you need a C compiler already. The Yocto/Buildroot type build systems get out of this problem in a simple way – they assume you already have a functioning gcc on the host system.

But in Debian and Fedora, in order to build the gcc package, you need gcc already built as a package – the build system won’t accept just having a “gcc” binary in the $PATH. That’s how the build systems work because again, that’s how the projects are defined.

If you haven’t done this recently, grab a mirror you can hold in your hand, and go into your bathroom, and point the hand mirror at the wall mirror. You’ll get an infinite recursion. It’s really quite beautiful and fun to do, but since I’m sure many of you won’t, there’s a good picture here.

This infinite recursion resulting from self-hosting is the reason there isn’t one reliable command to rebuild all of Debian or Fedora from source.

One question you might have – would it make sense to have a well-defined process for bootstrapping a self-hosting system like Debian? Some of the developers think so, and the DebianBootstrap wiki page describes the thoughts so far. Personally though, I think it’s both too complex and too vague. A much simpler, and ultimately more reliable goal would be to ensure that version N of the system can be built by N-1. So Fedora 17 can be built on a Fedora 16 system, Debian Wheezy can be built from Squeeze, Red Hat Enterprise Linux 6 can be built from Red Hat Entrerprise Linux 5, etc. Eventually this is a goal I’d like to achieve for Red Hat Enterprise Linux at least. There’d be some cost to packages with circular build dependencies, but having a well defined, reliable process for building from source: priceless.

GNOME Summit 2012 – Friday & Saturday

Friday: Newcomers pre-event

The Newcomers pre-event on Friday evening was very successful! A number of students came from nearby universities like Tufts and MIT, and were able to dive straight in to tools for GNOME development like Git, Jhbuild, and of course GTK+ and Gjs.

You can see the bugs they reported (with patches!) in bugzilla. Not everyone reported a bug – creating an account was a hurdle. Still though, the event was a success and likely something to replicate in the future.

Saturday opening

Saturday morning started with coffee and bagels – thanks to SUSE for sponsoring! As the summit is an “un-conference”, we made up a schedule on the fly, which you can see here.

GTK+

Benjamin Otte demoed some new CSS features that are implemented in GTK+ – notably a use of animated transition for background of active menu items. He also started a discussion on the challenges in maintaining the 750,000 lines of code in GTK+. We brainstormed about how we can attract more contributors to infrastructure such as GTK+.

Owen Taylor repeated his demo of client-compositor synchronization, covered by LWN previously.

GObject Introspection

Colin Walters discussed the state of introspection and the goals for the 3.7 development cycle – in particular, generating documentation from GIR files is the primary goal. There was an obligatory debate about which programming languages to use/promote in GNOME, along with some questions about how we can improve “bindability” of APIs in GNOME.

GNOME OS

This was a followup from GUADEC, and ended up taking up several hours. It opened with a status update on OSTree, also on LWN previously. The system has been doing successful continuous integration on over 200 git repositories, and notably all of GNOME up to gnome-shell, plus some application dependencies like gtksourceview.

While the current OSTree builder is automated, there was a lot of interest around making it better, and ensuring that when the build does break, that both the responsible party knows about it, as well as other people. This led into some comparisons with the WebKitGtk+ development process. Concrete action items resulting from this were an IRC bot and an improved web page.

There was general consensus that the composition of the modulesets is best handled by the release-team. A lot of time was spent on the topic of Jhbuild, and its strengths and weaknesses for application authors, GTK+ developers, and core OS hackers. Ryan Lortie described how we could make jhbuild better by having GNOME builders provide distribution-specific binaries, and later in the day could be seen furiously hacking on implementing it.

After lunch, the topic of applications came up. Allan discussed application stores, and how that can provide a good experience around things like centralized updates, and generating revenue for authors.

Following application stores, the topic of application sandboxes came up. Chris Ball mentioned difficulties OLPC had encountered in getting application authors to adapt to their sandboxing scheme. Strengths and weaknesses of the Android model came up, and at this
point we agreed to break out sandboxing as a standalone topic for the next day.

Developer toolset

Benjamin Otte started a discussion about our developer tools, and we talked about different kinds of application authors:

ourselves (GNOME Documents, Rhythmbox, etc.)
iphone/android type app
libreoffice-size application
enterprise apps (e.g. proprietary creative apps)
sysadmin apps

There are challenges in meeting the needs of all of these different
kinds of authors.

A11Y

Piñeiro started with a summary of the state of accessibility. 3.6 was a significant achievement in that a11y is always-on by default, and only impacts the system when an accessibilty tool is running.

He then talked about 3.7 features, such as improving the mangifier. There was also some plans to improve configurability, and improve discoverability of existing options.

The ongoing cost of maintaining fallback mode was mentioned briefly. Then he talked about allowing third parties to extend accessibility support when using custom widgets in GTK+.

Finally, two other points that were touched on were that some new applications (Documents, Clocks) have inaccessible features. Second, touch accessibility (for blind users).

3.7 planning

While there are a few features listed for GNOME 3.7 development, pretty much the entire hour of discussion was on fallback mode. As the feature page says, dropping fallback mode would allow us to significantly clean up some GNOME internals, it turned out on the other hand that Ubuntu Unity is likely relying on some of those as well.

The fallback mode users fall into two general categories; those who wanted a more GNOME 2 like experience, and those who were unable to run GNOME 3.

In general, the discussion seemed to follow along with what the feature page had already, and opinions varied. No hard decisions were made now; this will be an ongoing discussion.

GNOME Boston Summit, plus: why hacking on GNOME is fun!

The Boston Summit is announced! I’m looking forward to it; there was a lot of positive stuff at GUADEC, and more should happen here in Boston.

Announcement done, one thing I want to mention is why I find working on GNOME fun – there are actually a lot of hard challenges that arise in working on client-side operating system code, particularly around user interfaces. For example, this bug involves the intersection of X11, multithreading, garbage collection, how GC is different between CPython and Spidermonkey, and the cross-platform nature of GTK+. It’s really not an easy problem; there are difficult tradeoffs to be made between complexity and speed in different components. But solving these kinds of difficult issues is what I find rewarding as an engineer. And there are certainly plenty more to solve in the GNOME context!

On asynchronous/event-driven programming, and why it lies at the heart of GTK+ (and thus GNOME)

Von Neumann was missing some hardware

When I was in college, we never learned about event loops (we also weren’t really taught revision control formally, which is even more dire, but that’s another story). My early introduction to programming was all basically sequential. Taking courses on processor/memory architecture and assembler at the same time, at some point there was an epiphany when I realized it was all incredibly simple – the code I write gets compiled into machine code that the processor executes, modifying memory and jumping around, and there are some special calls to talk to devices. My feeling was everything else was just sugar on top of the fundamental Von Neumann architecture.

It was only when I really decided to get into GNOME that I was introduced (indirectly via GTK+) to event-driven programming. Now, all of a sudden, my program interacts with other programs, and all sorts of things can happen in any order. More than that, you really have to understand the representation of both time and how operating system schedulers work to make sense of it (down to the hardware). While of course there was always an operating system underneath, when and how exactly my program was scheduled was irrelevant, because it was entirely linear.

The concept of time alone is actually really complex – take the difference between monotonic time versus the wall clock. What’s more, there has to be something in hardware to implement this. Well, OK, people did write code that assumed a fixed frequency of the CPU, and this resulted in Turbo buttons, a fun bit of computing history. But the point is that the simplistic Von Neumann architecture wasn’t actually a useful mental model anymore.

GTK+

The reason GTK+ programming requires an event loop is because you need to keep drawing to the screen, reacting to user events, even if your app is doing something else (most typically blocking on I/O, more rarely you’re CPU bound). Owen’s talk today at GUADEC was a great reminder of the amount of complexity and coordination involved (It was also a cool talk!).While I think originally the event loop was part of GTK+, it today lives in GLib.

My message here to people I’ve talked to at GUADEC who are just learning GNOME programming is to understand that this bit is the fundamental piece upon which everything else depends. The second most important bit is the big bag of handy pre-written widgets that live in GTK+; but you could imagine writing an app without that, tedious as it might be. And what’s important about the main loop is it doesn’t really work unless everything in your program/process shares the same one. Getting access to the main loop (and the bag of widgets) is the reason why gobject-introspection exists; it’s why you have to learn new ways of doing things instead of just taking “regular” Python, JavaScript, or whatever examples you might find from typical sequential programs that is probably still the most common type of software.

Asynchronously deleting a directory

So I want to give a specific example of how it’s very interesting to use GLib’s extensive asynchronous infrastructure for a fairly common task – recursively deleting a folder. I’ve pushed some example code here – there’s a version written in Gjs, and one in C. One quick note – I actually just wrote a GLib patch necessary for the example. So…use git =)

If you look at the code, it certainly looks very twisted, bouncing around with state. The code doesn’t execute top to bottom (like a sequential version would); rather mostly the reverse. What’s the advantage of all of this pain? Well, let’s say we want to print progress once a second. This is actually quite nontrivial to do in a sequential program. Let me give you a real world example – git (git the actual program itself). I’m not going to explain the drawbacks of setitimer here; what I do want to show is just how easy it is to do on top of the GLib main loop. Here’s the commit. And if you wanted to do more things at once, such as query for user input on files which are write-protected, that can still happen while other files are being deleted.

Faster?

One very interesting question I had when I was writing this was – would it actually be faster than the venerable GNU Coreutils, which is just a synchronous program? Concretely, when it calls the POSIX unlink(2) call – the whole program is blocked. But if we give the kernel more work to do at one time, it can often make smarter scheduling decisions. This turns out to not be the case (at least on my laptop). Looking through perf record, it looks like all the threads are getting tangled up in various VFS locks, which is actually not at all surprising – it’s just not optimized for multiple threads deleting files from a directory while it’s also being traversed. I also have a suspicion that the default CFQ scheduling may be optimized for the common Unix-utility style synchronous serial I/O over the “random” I/O patterns that asynchronous programming generates.

Conclusion

Event driven programming is the most fundamental part of writing any kind of GUI program, and it’s also very effective for many other programming domains too; nodejs.org seems to be the currently most widely talked about system that has this same style, but there have been many in the past too. Hopefully this post helped explain how some of the fundamental parts of the GNOME/GTK+ stack fit into wider technological picture.

Efficiency of git versus tarballs for source code transmission and storage over time

In GNOME, for various reasons (mostly historical), as part of the release process we still take our git repositories and run autoconf/automake on developer machines, and upload the result to the FTP server. One question I had today is – how many times do I as a developer need to download separate versions as a tarball before it would have been more efficient to just download the entire history as a git repository?

The answer to this obviously varies per repository. It’ll be a function of variables such as the length of the history of the module, whether or not it has large static assets (e.g. png images), etc. Let’s take a module I maintain, gobject-introspection. It has a nontrivial history, dating back to 2005, and has seen periods of peak activity, then has been fairly constant after that.

What we want to compare here is the size of tarballs to the size of the packfile that git will serve us. Here’s how we do it:


$ ls -al gobject-introspection-1.33.2.tar.xz
-rw-rw-r--. 1 walters walters 1.1M Jun  5 11:58 gobject-introspection-1.33.2.tar.xz
$ git repack -a -d
Counting objects: 18501, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3499/3499), done.
Writing objects: 100% (18501/18501), done.
Total 18501 (delta 14971), reused 18501 (delta 14971)
$ du -sh .git
7.8M    .git

This means that for gobject-introspection, if I end up downloading the source code more than 7 times in tarball form (xz compressed), it would have been more efficient to download git instead. How about gtk+, which has a significantly longer history? The answer there is 16 times (current tarball is 13M, git repository is 213M). How about gnome-documents, which has a much shorter revision control history? Just 3 times!

A naive source code storage system based on tarballs would keep all of them as separate files, so what we’ve looked at above for network transmission also would apply in that case to at-rest storage. Anyways, just some data…

On software engineering and optimization

From Bryan Cantrill’s blog:

Adding a hook of this nature requires an understanding of the degree to which the underlying code path is performance-critical. That is, to contemplate adding this hook, we needed to ask: how hot is closef(), anyway? Historically in my career as a software engineer, this kind of question would be answered with a combination of ass scratching and hand waving.

Sadly too true in my experience as well. Today when I’m reviewing a patch and a performance increase is claimed, I always ask for numbers and methodology. You’d think this would be the norm – most of the advancement in our society over the last few hundred years lies with the scientific method, but the problem is it’s just too damn easy to modify software. Why bother actually measuring when we can just make a change, find out it’s broken later, then change it again immediately?

This gets to something that’s been on my mind lately, which is that we should only try to optimize for two things: latency, and power usage. The nice thing about this is that “traditional” tradeoffs like Space-time are neatly encapsulated by power usage, because RAM, CPUs/GPUs, and hard disks consume power. Is it a good idea to cache that file in memory (parse the file once, but forces the system to retain it in RAM, at a constant power draw), or re-parse it when we need it periodically, then discard the data (more CPU draw periodically, less constant RAM draw)? If you’re optimizing for power draw, looking at representative workloads would give you the answer. Even better, power usage is specific to particular machines, which is how real-world optimization works.

attribute ((cleanup)), or how I came to love C again

If your project is written in C, and you don’t mind having a hard dependency on GCC (i.e. you don’t care about building with MSVC++), this GNU C extension does amazing things for your code.

There are other GNU C extensions that are simultaneously useless and crazy, but __attribute__ ((cleanup)) makes up for everything else. For reference, here’s a discussion about using it in GNOME.

Definition of “upstream”

There’s a lot of terminology we tend to use in the Free Software community, but we lack any kind of widely accepted dictionary for our “industry jargon”. Wikipedia has pages on some of this, but Wikipedia isn’t the same thing as a dictionary.

Anyways, I want to attempt a definition for “upstream”:

upstream(n): A FOSS project with an active and robust peer-review process.

I rely here on the definition of both “FOSS” and “project”. The wikipedia page for FOSS is a good enough substitute for a dictionary entry, and let’s ignore for now the possible meanings of “project” here. The emphasis in my definition is on “active and robust peer-review process”. Why is that?

Because basically, without peer review, there’s no interesting difference between say a Debian “package” (what many people seem to consider “downstream”) and a git repository on Sourceforge (what people consider “upstream”). There’s no point saying “push this change upstream” if that just means it gets added to a git repository without robust inspection. All that happened was some bytes got copied across the Internet from point A to point B.

GNOME as a platform

In the previous post, I discussed platforms and their relationship to “projects” and “products”. While I was writing it, I had in mind an old blog post from Havoc. It took me a while to find it…can’t believe it’s been 6 years. Anyways, you should go and read that post before continuing. Here’s the link again.

What I’d like to argue – and most of you probably agree – is that GNOME shouldn’t explicitly take the “building block” or “platform” approach. There are multiple reasons for this, but the strongest one I think is that if we focus just on making a Free Software desktop that doesn’t suck, by side effect we will produce a platform. And in fact – that’s exactly what has happened. Think NetworkManager for example. Getting a network experience (particularly with wireless) that was remotely competitive with Windows XP required us to invent a new networking system.

If we just said “we’re a bucket of parts”, and not the ones actually out in front trying to make a networking user interface, basically there would be no obvious driver for a networking API (besides toys/tests), so it wouldn’t be tested, and in practice it wouldn’t really work. Or at least, there would be some immense lag between some third party engineer telling us problems with the API and getting them fixed.

Will third parties take the code and do things with it? Of course. And that’s allowed by the fact that GNOME is Free Software, and we want to “support” that for some values of “support”.

One thing bears mentioning – of course GNOME should be a platform for application authors. That’s in fact an important part of our place in the ecosystem. But as far as being a collection of parts versus something more, here’s the way I think of it: if you can walk up to a computer and say “Oh that’s running GNOME”, i.e. we have a coherent design and visual identity, then we’re succeeding.

GNOME is not unique in being an “end-user” focused Free Software project debating the platform versus project/product issue. See also the Mozilla platform versus Firefox. The role and relationship of those two has been a subject of (sometimes very contentious) debate in that community. And that’s fine – debating the line is good. As long as you keep producing something that doesn’t suck while debating =)

Platforms as a side effect

What I want to talk about here is a simple statement that I believe to be true:

The best platforms are written by the people who are forced to invent them as they make a product.

Years ago I learned a bit about J2EE; never actually wrote an app using it, but enough to get a sense. I came away with the very strong impression that the people working on it were driven by committee, with managers in their respective contributing corporations telling them what to do. They weren’t the same people out in the field writing apps using it, day in and day out, under time pressure to produce as much as possible. On the other hand, from Ruby On Rails Wikipedia:

David Heinemeier Hansson extracted Ruby on Rails from his work on Basecamp, a project management tool by 37signals (now a web application company).[10]

Now, I’ve never written a Rails app either, but it’s pretty clear from the Internet which one of these wins. Another excellent example is the Amazon Web Services. Amazon had a lot of this internally because they were forced to in order to make a web shopping site before CEO Jeff Bezos made the key decision to spin it off as a platform.

And the most topical example here – GTK+ was originally spun out of the GIMP project because Motif sucked. Anyways, some food for thought. Basically if you’re one of those people in the trenches writing a platform – you should consider asking your manager to switch to writing apps for a bit. At least hopefully this blog post reminds me later that I have a few GTK+ apps that I really should get back to hacking on…