GNOME Boston Summit, plus: why hacking on GNOME is fun!

The Boston Summit is announced! I’m looking forward to it; there was a lot of positive stuff at GUADEC, and more should happen here in Boston.

Announcement done, one thing I want to mention is why I find working on GNOME fun – there are actually a lot of hard challenges that arise in working on client-side operating system code, particularly around user interfaces. For example, this bug involves the intersection of X11, multithreading, garbage collection, how GC is different between CPython and Spidermonkey, and the cross-platform nature of GTK+. It’s really not an easy problem; there are difficult tradeoffs to be made between complexity and speed in different components. But solving these kinds of difficult issues is what I find rewarding as an engineer. And there are certainly plenty more to solve in the GNOME context!

On asynchronous/event-driven programming, and why it lies at the heart of GTK+ (and thus GNOME)

Von Neumann was missing some hardware

When I was in college, we never learned about event loops (we also weren’t really taught revision control formally, which is even more dire, but that’s another story). My early introduction to programming was all basically sequential. Taking courses on processor/memory architecture and assembler at the same time, at some point there was an epiphany when I realized it was all incredibly simple – the code I write gets compiled into machine code that the processor executes, modifying memory and jumping around, and there are some special calls to talk to devices. My feeling was everything else was just sugar on top of the fundamental Von Neumann architecture.

It was only when I really decided to get into GNOME that I was introduced (indirectly via GTK+) to event-driven programming. Now, all of a sudden, my program interacts with other programs, and all sorts of things can happen in any order. More than that, you really have to understand the representation of both time and how operating system schedulers work to make sense of it (down to the hardware). While of course there was always an operating system underneath, when and how exactly my program was scheduled was irrelevant, because it was entirely linear.

The concept of time alone is actually really complex – take the difference between monotonic time versus the wall clock. What’s more, there has to be something in hardware to implement this. Well, OK, people did write code that assumed a fixed frequency of the CPU, and this resulted in Turbo buttons, a fun bit of computing history. But the point is that the simplistic Von Neumann architecture wasn’t actually a useful mental model anymore.

GTK+

The reason GTK+ programming requires an event loop is because you need to keep drawing to the screen, reacting to user events, even if your app is doing something else (most typically blocking on I/O, more rarely you’re CPU bound). Owen’s talk today at GUADEC was a great reminder of the amount of complexity and coordination involved (It was also a cool talk!).While I think originally the event loop was part of GTK+, it today lives in GLib.

My message here to people I’ve talked to at GUADEC who are just learning GNOME programming is to understand that this bit is the fundamental piece upon which everything else depends. The second most important bit is the big bag of handy pre-written widgets that live in GTK+; but you could imagine writing an app without that, tedious as it might be. And what’s important about the main loop is it doesn’t really work unless everything in your program/process shares the same one. Getting access to the main loop (and the bag of widgets) is the reason why gobject-introspection exists; it’s why you have to learn new ways of doing things instead of just taking “regular” Python, JavaScript, or whatever examples you might find from typical sequential programs that is probably still the most common type of software.

Asynchronously deleting a directory

So I want to give a specific example of how it’s very interesting to use GLib’s extensive asynchronous infrastructure for a fairly common task – recursively deleting a folder. I’ve pushed some example code here – there’s a version written in Gjs, and one in C. One quick note – I actually just wrote a GLib patch necessary for the example. So…use git =)

If you look at the code, it certainly looks very twisted, bouncing around with state. The code doesn’t execute top to bottom (like a sequential version would); rather mostly the reverse. What’s the advantage of all of this pain? Well, let’s say we want to print progress once a second. This is actually quite nontrivial to do in a sequential program. Let me give you a real world example – git (git the actual program itself). I’m not going to explain the drawbacks of setitimer here; what I do want to show is just how easy it is to do on top of the GLib main loop. Here’s the commit. And if you wanted to do more things at once, such as query for user input on files which are write-protected, that can still happen while other files are being deleted.

Faster?

One very interesting question I had when I was writing this was – would it actually be faster than the venerable GNU Coreutils, which is just a synchronous program? Concretely, when it calls the POSIX unlink(2) call – the whole program is blocked. But if we give the kernel more work to do at one time, it can often make smarter scheduling decisions. This turns out to not be the case (at least on my laptop). Looking through perf record, it looks like all the threads are getting tangled up in various VFS locks, which is actually not at all surprising – it’s just not optimized for multiple threads deleting files from a directory while it’s also being traversed. I also have a suspicion that the default CFQ scheduling may be optimized for the common Unix-utility style synchronous serial I/O over the “random” I/O patterns that asynchronous programming generates.

Conclusion

Event driven programming is the most fundamental part of writing any kind of GUI program, and it’s also very effective for many other programming domains too; nodejs.org seems to be the currently most widely talked about system that has this same style, but there have been many in the past too. Hopefully this post helped explain how some of the fundamental parts of the GNOME/GTK+ stack fit into wider technological picture.

Efficiency of git versus tarballs for source code transmission and storage over time

In GNOME, for various reasons (mostly historical), as part of the release process we still take our git repositories and run autoconf/automake on developer machines, and upload the result to the FTP server. One question I had today is – how many times do I as a developer need to download separate versions as a tarball before it would have been more efficient to just download the entire history as a git repository?

The answer to this obviously varies per repository. It’ll be a function of variables such as the length of the history of the module, whether or not it has large static assets (e.g. png images), etc. Let’s take a module I maintain, gobject-introspection. It has a nontrivial history, dating back to 2005, and has seen periods of peak activity, then has been fairly constant after that.

What we want to compare here is the size of tarballs to the size of the packfile that git will serve us. Here’s how we do it:


$ ls -al gobject-introspection-1.33.2.tar.xz
-rw-rw-r--. 1 walters walters 1.1M Jun  5 11:58 gobject-introspection-1.33.2.tar.xz
$ git repack -a -d
Counting objects: 18501, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3499/3499), done.
Writing objects: 100% (18501/18501), done.
Total 18501 (delta 14971), reused 18501 (delta 14971)
$ du -sh .git
7.8M    .git

This means that for gobject-introspection, if I end up downloading the source code more than 7 times in tarball form (xz compressed), it would have been more efficient to download git instead. How about gtk+, which has a significantly longer history? The answer there is 16 times (current tarball is 13M, git repository is 213M). How about gnome-documents, which has a much shorter revision control history? Just 3 times!

A naive source code storage system based on tarballs would keep all of them as separate files, so what we’ve looked at above for network transmission also would apply in that case to at-rest storage. Anyways, just some data…

On software engineering and optimization

From Bryan Cantrill’s blog:

Adding a hook of this nature requires an understanding of the degree to which the underlying code path is performance-critical. That is, to contemplate adding this hook, we needed to ask: how hot is closef(), anyway? Historically in my career as a software engineer, this kind of question would be answered with a combination of ass scratching and hand waving.

Sadly too true in my experience as well. Today when I’m reviewing a patch and a performance increase is claimed, I always ask for numbers and methodology. You’d think this would be the norm – most of the advancement in our society over the last few hundred years lies with the scientific method, but the problem is it’s just too damn easy to modify software. Why bother actually measuring when we can just make a change, find out it’s broken later, then change it again immediately?

This gets to something that’s been on my mind lately, which is that we should only try to optimize for two things: latency, and power usage. The nice thing about this is that “traditional” tradeoffs like Space-time are neatly encapsulated by power usage, because RAM, CPUs/GPUs, and hard disks consume power. Is it a good idea to cache that file in memory (parse the file once, but forces the system to retain it in RAM, at a constant power draw), or re-parse it when we need it periodically, then discard the data (more CPU draw periodically, less constant RAM draw)? If you’re optimizing for power draw, looking at representative workloads would give you the answer. Even better, power usage is specific to particular machines, which is how real-world optimization works.

Definition of “upstream”

There’s a lot of terminology we tend to use in the Free Software community, but we lack any kind of widely accepted dictionary for our “industry jargon”. Wikipedia has pages on some of this, but Wikipedia isn’t the same thing as a dictionary.

Anyways, I want to attempt a definition for “upstream”:

upstream(n): A FOSS project with an active and robust peer-review process.

I rely here on the definition of both “FOSS” and “project”. The wikipedia page for FOSS is a good enough substitute for a dictionary entry, and let’s ignore for now the possible meanings of “project” here. The emphasis in my definition is on “active and robust peer-review process”. Why is that?

Because basically, without peer review, there’s no interesting difference between say a Debian “package” (what many people seem to consider “downstream”) and a git repository on Sourceforge (what people consider “upstream”). There’s no point saying “push this change upstream” if that just means it gets added to a git repository without robust inspection. All that happened was some bytes got copied across the Internet from point A to point B.

GNOME as a platform

In the previous post, I discussed platforms and their relationship to “projects” and “products”. While I was writing it, I had in mind an old blog post from Havoc. It took me a while to find it…can’t believe it’s been 6 years. Anyways, you should go and read that post before continuing. Here’s the link again.

What I’d like to argue – and most of you probably agree – is that GNOME shouldn’t explicitly take the “building block” or “platform” approach. There are multiple reasons for this, but the strongest one I think is that if we focus just on making a Free Software desktop that doesn’t suck, by side effect we will produce a platform. And in fact – that’s exactly what has happened. Think NetworkManager for example. Getting a network experience (particularly with wireless) that was remotely competitive with Windows XP required us to invent a new networking system.

If we just said “we’re a bucket of parts”, and not the ones actually out in front trying to make a networking user interface, basically there would be no obvious driver for a networking API (besides toys/tests), so it wouldn’t be tested, and in practice it wouldn’t really work. Or at least, there would be some immense lag between some third party engineer telling us problems with the API and getting them fixed.

Will third parties take the code and do things with it? Of course. And that’s allowed by the fact that GNOME is Free Software, and we want to “support” that for some values of “support”.

One thing bears mentioning – of course GNOME should be a platform for application authors. That’s in fact an important part of our place in the ecosystem. But as far as being a collection of parts versus something more, here’s the way I think of it: if you can walk up to a computer and say “Oh that’s running GNOME”, i.e. we have a coherent design and visual identity, then we’re succeeding.

GNOME is not unique in being an “end-user” focused Free Software project debating the platform versus project/product issue. See also the Mozilla platform versus Firefox. The role and relationship of those two has been a subject of (sometimes very contentious) debate in that community. And that’s fine – debating the line is good. As long as you keep producing something that doesn’t suck while debating =)

Platforms as a side effect

What I want to talk about here is a simple statement that I believe to be true:

The best platforms are written by the people who are forced to invent them as they make a product.

Years ago I learned a bit about J2EE; never actually wrote an app using it, but enough to get a sense. I came away with the very strong impression that the people working on it were driven by committee, with managers in their respective contributing corporations telling them what to do. They weren’t the same people out in the field writing apps using it, day in and day out, under time pressure to produce as much as possible. On the other hand, from Ruby On Rails Wikipedia:

David Heinemeier Hansson extracted Ruby on Rails from his work on Basecamp, a project management tool by 37signals (now a web application company).[10]

Now, I’ve never written a Rails app either, but it’s pretty clear from the Internet which one of these wins. Another excellent example is the Amazon Web Services. Amazon had a lot of this internally because they were forced to in order to make a web shopping site before CEO Jeff Bezos made the key decision to spin it off as a platform.

And the most topical example here – GTK+ was originally spun out of the GIMP project because Motif sucked. Anyways, some food for thought. Basically if you’re one of those people in the trenches writing a platform – you should consider asking your manager to switch to writing apps for a bit. At least hopefully this blog post reminds me later that I have a few GTK+ apps that I really should get back to hacking on…

The GPL and distributing binaries

Of late I’ve become the “build guy” in GNOME it seems. One thing I want to clear up is I do not actually care about building just because I think it’s fun or interesting in and of itself. No, the reason I care about building is because if software doesn’t build, then clearly it’s not being run. And if it’s not being run, then it’s not being tested. And if it’s not tested, then it will be crap. In other words, a competent build system is necessary for not producing crap (but not sufficient, obviously).

That motivation established, what I want to talk about is the GPL (and the LGPL). Specifically, this section:

The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.

The goal (spirit) of the GPL here is that if you receive binaries, you should be able to rebuild those binaries from source. Probably a number of people who work on GPL’d code may simply think that saying “The source is this git SHA1 from this repository” or “The source is this tarball” is enough – but things get interesting with the sentence plus the scripts used to control compilation and installation of the executable. It’s my understanding that this provision hasn’t been heavily explored, but a rough consensus is embodied here.

For example, an interesting question you might ask is – does that GPL provision apply to network build servers, like Open Build Service? I actually don’t know the answer, and OBS is under the GPL itself, so clearly they don’t have a problem distributing it. Now, the Xen stuff aside, OBS is basically just a wrapper around running RPM or dpkg, which in turn are just wrappers around Makefiles, which are in turn wrappers around the stuff that actually matters (the code). But still – were I to run a network build server like that, I’d probably be extra careful and embed in each build log the version of the build server (and link to source for that version), which they don’t appear to do right now.

How about the WebOS sources? There’s no links there to any scripts or details about their build system, or in general any clues for how you’d take those sources and rebuild them, and update your WebOS device with the updated binaries. Maybe that’s covered elsewhere – I didn’t look extensively.

I also recently looked at Zeroinstall, and I think they’re flirting with non-compliance, at least in some cases. For example, there’s little information on the ClanBomber feed about how it was built (with what version of 0install, with what version of what underlying distribution, etc.)

To be clear – I strongly believe in the GPL (and LGPL) – they’re a key foundation of our community and what we’re building. But complying with the provisions are not as easy as one might think, and I’d argue that this requirement is a key driver of “packages” as you see the in Debian and RPM worlds. They’re about having a full story for how you can reliably rebuild software and reinstall it on your computer. All the rest of the stuff they also do (configuration management, being able to dynamically turn a minimal install image into a desktop, etc.) is secondary.

TL;DR – If distributing binaries of GPL’d (or LGPL) software, “packages” (Debian/RPM style) for self-hosting builds are basically the state of the art. For cross builds, the Yocto project exudes competence. If you’re not using one of those systems and you skipped to this TL;DR section, you should go back to the top and read the whole post.

OS APIs: Windows 8/WinRT and GNOME/GTK+

If one is making an operating system, clearly the API that application authors use is extremely important. The whole point of an operating system is to run them. What I want to talk about is fundamental APIs, or the lowest stable level.

If you have even a passing awareness of the evolution of Microsoft Windows over the last 25 years, you know there have been a lot of APIs that have appeared, been promoted and marketed to Windows developers, and then either deprecated or relegated to just an “option”. For example, MFC. However, these APIs are all wrappers. The only time the fundamental Windows API broke incompatibly (I believe) that 16 bit applications don’t run on 64 bit; see X86-64. But the point here is that since the introduction of 32 bit Windows NT in 1993, if you coded to that API, your application will still run. If your application still runs, that means it’s the same operating system.

More recently, Microsoft for a while was promoting .NET heavily, arguably more than any of their frameworks before. It’s important to understand that at its introduction, .NET was still fundamentally a wrapper. For a while, there were rumors that it might become a fundamental API, but with the introduction of Windows 8, that won’t happen. For a really great read on this, see this ArsTechnica article. Especially fun to read are the bits about the politics, which the article only mentions in passing – you can find more scattered about the internet, like here.

Now Microsoft is saying something huge: all new APIs will be based on this new WinRT thing. Your old Win32 apps won’t break, but this time we have reason to believe they’re pretty serious – this really is the new fundamental API, and if you want your app to use new features, you will have to use WinRT. You can access WinRT from plain bog standard C, even if it’s not beautiful.

How does all of this relate to GNOME and GTK+? I think WinRT validates where we’ve been doing in GNOME with GObject Introspection. GObject may not be the most concise thing in the world (though it definitely beats C/COM), but the combination of a pure C base with added metadata and runtime support mean that all of our fundamental APIs (basically the GTK+ stack, and notably GIO for non-GUI programs) remain accessible to C and also available in other runtimes and languages.

There’s a lot more to do on introspection – we desperately need a complete documentation generator for example. It’s also pretty clear to me in that in order to truly succeed, we need to “downgrade” C to be a consumer of the API rather than a source, i.e. we need to do what Microsoft has done and define interfaces in an IDL. That will be interesting to do while still keeping around the old C APIs that don’t match the projected C binding.

TL;DR – I believe GNOME’s approach of using C with metadata and minimal runtime hooks as a fundamental operating system API is the right course, and we should keep doing what we’re doing.