On software engineering and optimization

From Bryan Cantrill’s blog:

Adding a hook of this nature requires an understanding of the degree to which the underlying code path is performance-critical. That is, to contemplate adding this hook, we needed to ask: how hot is closef(), anyway? Historically in my career as a software engineer, this kind of question would be answered with a combination of ass scratching and hand waving.

Sadly too true in my experience as well. Today when I’m reviewing a patch and a performance increase is claimed, I always ask for numbers and methodology. You’d think this would be the norm – most of the advancement in our society over the last few hundred years lies with the scientific method, but the problem is it’s just too damn easy to modify software. Why bother actually measuring when we can just make a change, find out it’s broken later, then change it again immediately?

This gets to something that’s been on my mind lately, which is that we should only try to optimize for two things: latency, and power usage. The nice thing about this is that “traditional” tradeoffs like Space-time are neatly encapsulated by power usage, because RAM, CPUs/GPUs, and hard disks consume power. Is it a good idea to cache that file in memory (parse the file once, but forces the system to retain it in RAM, at a constant power draw), or re-parse it when we need it periodically, then discard the data (more CPU draw periodically, less constant RAM draw)? If you’re optimizing for power draw, looking at representative workloads would give you the answer. Even better, power usage is specific to particular machines, which is how real-world optimization works.

Definition of “upstream”

There’s a lot of terminology we tend to use in the Free Software community, but we lack any kind of widely accepted dictionary for our “industry jargon”. Wikipedia has pages on some of this, but Wikipedia isn’t the same thing as a dictionary.

Anyways, I want to attempt a definition for “upstream”:

upstream(n): A FOSS project with an active and robust peer-review process.

I rely here on the definition of both “FOSS” and “project”. The wikipedia page for FOSS is a good enough substitute for a dictionary entry, and let’s ignore for now the possible meanings of “project” here. The emphasis in my definition is on “active and robust peer-review process”. Why is that?

Because basically, without peer review, there’s no interesting difference between say a Debian “package” (what many people seem to consider “downstream”) and a git repository on Sourceforge (what people consider “upstream”). There’s no point saying “push this change upstream” if that just means it gets added to a git repository without robust inspection. All that happened was some bytes got copied across the Internet from point A to point B.

GNOME as a platform

In the previous post, I discussed platforms and their relationship to “projects” and “products”. While I was writing it, I had in mind an old blog post from Havoc. It took me a while to find it…can’t believe it’s been 6 years. Anyways, you should go and read that post before continuing. Here’s the link again.

What I’d like to argue – and most of you probably agree – is that GNOME shouldn’t explicitly take the “building block” or “platform” approach. There are multiple reasons for this, but the strongest one I think is that if we focus just on making a Free Software desktop that doesn’t suck, by side effect we will produce a platform. And in fact – that’s exactly what has happened. Think NetworkManager for example. Getting a network experience (particularly with wireless) that was remotely competitive with Windows XP required us to invent a new networking system.

If we just said “we’re a bucket of parts”, and not the ones actually out in front trying to make a networking user interface, basically there would be no obvious driver for a networking API (besides toys/tests), so it wouldn’t be tested, and in practice it wouldn’t really work. Or at least, there would be some immense lag between some third party engineer telling us problems with the API and getting them fixed.

Will third parties take the code and do things with it? Of course. And that’s allowed by the fact that GNOME is Free Software, and we want to “support” that for some values of “support”.

One thing bears mentioning – of course GNOME should be a platform for application authors. That’s in fact an important part of our place in the ecosystem. But as far as being a collection of parts versus something more, here’s the way I think of it: if you can walk up to a computer and say “Oh that’s running GNOME”, i.e. we have a coherent design and visual identity, then we’re succeeding.

GNOME is not unique in being an “end-user” focused Free Software project debating the platform versus project/product issue. See also the Mozilla platform versus Firefox. The role and relationship of those two has been a subject of (sometimes very contentious) debate in that community. And that’s fine – debating the line is good. As long as you keep producing something that doesn’t suck while debating =)

Platforms as a side effect

What I want to talk about here is a simple statement that I believe to be true:

The best platforms are written by the people who are forced to invent them as they make a product.

Years ago I learned a bit about J2EE; never actually wrote an app using it, but enough to get a sense. I came away with the very strong impression that the people working on it were driven by committee, with managers in their respective contributing corporations telling them what to do. They weren’t the same people out in the field writing apps using it, day in and day out, under time pressure to produce as much as possible. On the other hand, from Ruby On Rails Wikipedia:

David Heinemeier Hansson extracted Ruby on Rails from his work on Basecamp, a project management tool by 37signals (now a web application company).[10]

Now, I’ve never written a Rails app either, but it’s pretty clear from the Internet which one of these wins. Another excellent example is the Amazon Web Services. Amazon had a lot of this internally because they were forced to in order to make a web shopping site before CEO Jeff Bezos made the key decision to spin it off as a platform.

And the most topical example here – GTK+ was originally spun out of the GIMP project because Motif sucked. Anyways, some food for thought. Basically if you’re one of those people in the trenches writing a platform – you should consider asking your manager to switch to writing apps for a bit. At least hopefully this blog post reminds me later that I have a few GTK+ apps that I really should get back to hacking on…

The GPL and distributing binaries

Of late I’ve become the “build guy” in GNOME it seems. One thing I want to clear up is I do not actually care about building just because I think it’s fun or interesting in and of itself. No, the reason I care about building is because if software doesn’t build, then clearly it’s not being run. And if it’s not being run, then it’s not being tested. And if it’s not tested, then it will be crap. In other words, a competent build system is necessary for not producing crap (but not sufficient, obviously).

That motivation established, what I want to talk about is the GPL (and the LGPL). Specifically, this section:

The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.

The goal (spirit) of the GPL here is that if you receive binaries, you should be able to rebuild those binaries from source. Probably a number of people who work on GPL’d code may simply think that saying “The source is this git SHA1 from this repository” or “The source is this tarball” is enough – but things get interesting with the sentence plus the scripts used to control compilation and installation of the executable. It’s my understanding that this provision hasn’t been heavily explored, but a rough consensus is embodied here.

For example, an interesting question you might ask is – does that GPL provision apply to network build servers, like Open Build Service? I actually don’t know the answer, and OBS is under the GPL itself, so clearly they don’t have a problem distributing it. Now, the Xen stuff aside, OBS is basically just a wrapper around running RPM or dpkg, which in turn are just wrappers around Makefiles, which are in turn wrappers around the stuff that actually matters (the code). But still – were I to run a network build server like that, I’d probably be extra careful and embed in each build log the version of the build server (and link to source for that version), which they don’t appear to do right now.

How about the WebOS sources? There’s no links there to any scripts or details about their build system, or in general any clues for how you’d take those sources and rebuild them, and update your WebOS device with the updated binaries. Maybe that’s covered elsewhere – I didn’t look extensively.

I also recently looked at Zeroinstall, and I think they’re flirting with non-compliance, at least in some cases. For example, there’s little information on the ClanBomber feed about how it was built (with what version of 0install, with what version of what underlying distribution, etc.)

To be clear – I strongly believe in the GPL (and LGPL) – they’re a key foundation of our community and what we’re building. But complying with the provisions are not as easy as one might think, and I’d argue that this requirement is a key driver of “packages” as you see the in Debian and RPM worlds. They’re about having a full story for how you can reliably rebuild software and reinstall it on your computer. All the rest of the stuff they also do (configuration management, being able to dynamically turn a minimal install image into a desktop, etc.) is secondary.

TL;DR – If distributing binaries of GPL’d (or LGPL) software, “packages” (Debian/RPM style) for self-hosting builds are basically the state of the art. For cross builds, the Yocto project exudes competence. If you’re not using one of those systems and you skipped to this TL;DR section, you should go back to the top and read the whole post.

OS APIs: Windows 8/WinRT and GNOME/GTK+

If one is making an operating system, clearly the API that application authors use is extremely important. The whole point of an operating system is to run them. What I want to talk about is fundamental APIs, or the lowest stable level.

If you have even a passing awareness of the evolution of Microsoft Windows over the last 25 years, you know there have been a lot of APIs that have appeared, been promoted and marketed to Windows developers, and then either deprecated or relegated to just an “option”. For example, MFC. However, these APIs are all wrappers. The only time the fundamental Windows API broke incompatibly (I believe) that 16 bit applications don’t run on 64 bit; see X86-64. But the point here is that since the introduction of 32 bit Windows NT in 1993, if you coded to that API, your application will still run. If your application still runs, that means it’s the same operating system.

More recently, Microsoft for a while was promoting .NET heavily, arguably more than any of their frameworks before. It’s important to understand that at its introduction, .NET was still fundamentally a wrapper. For a while, there were rumors that it might become a fundamental API, but with the introduction of Windows 8, that won’t happen. For a really great read on this, see this ArsTechnica article. Especially fun to read are the bits about the politics, which the article only mentions in passing – you can find more scattered about the internet, like here.

Now Microsoft is saying something huge: all new APIs will be based on this new WinRT thing. Your old Win32 apps won’t break, but this time we have reason to believe they’re pretty serious – this really is the new fundamental API, and if you want your app to use new features, you will have to use WinRT. You can access WinRT from plain bog standard C, even if it’s not beautiful.

How does all of this relate to GNOME and GTK+? I think WinRT validates where we’ve been doing in GNOME with GObject Introspection. GObject may not be the most concise thing in the world (though it definitely beats C/COM), but the combination of a pure C base with added metadata and runtime support mean that all of our fundamental APIs (basically the GTK+ stack, and notably GIO for non-GUI programs) remain accessible to C and also available in other runtimes and languages.

There’s a lot more to do on introspection – we desperately need a complete documentation generator for example. It’s also pretty clear to me in that in order to truly succeed, we need to “downgrade” C to be a consumer of the API rather than a source, i.e. we need to do what Microsoft has done and define interfaces in an IDL. That will be interesting to do while still keeping around the old C APIs that don’t match the projected C binding.

TL;DR – I believe GNOME’s approach of using C with metadata and minimal runtime hooks as a fundamental operating system API is the right course, and we should keep doing what we’re doing.