Why OSTree requires “/usr/etc”

While I often lump “dpkg/rpm” together since they’re very similar architecturally, their implementation of configuration file management is a small but nontrivial difference. This is because dpkg allows interactivity, rpm does not.

With dpkg, in the case where a file is modified locally by the user (say /etc/httpd/conf/httpd.conf), dpkg will prompt the admin interactively. It only has at this time the modified config file, and the new file. Now, personally I think dpkg prompting in the middle of open heart surgery on your root file system is completely insane. You really must use screen or equivalent if doing remote administration. But being able to see what files changed is useful.

RPM is quite opposite. While it also does open heart surgery, you might see one or two messages related to config files go by in the output spam. At the end, it’s up to you to search for .rpmnew and such in /etc. Of course there does exist tooling on top to detect this after – I’m just talking about the out-of-the-box experience. Furthermore, allowing packagers to create a distinction between %config and %config(noreplace) makes the whole thing very inconsistent from a total system perspective. It’s hard to know what will happen on an upgrade unless you know beforehand what the packager chose.

The handling of /etc for OSTree took me a while of thought. The executive summary is that OSTree requires the existence of /usr/etc which is read-only defaults. Whenever you do an upgrade (more generally, switching trees), OSTree does a basic 3-way merge. It doesn’t attempt to understand the contents of files – if you have modified a config file in any way, that wins. Unlike the simplistic “split partitions” approaches out there, this does mean that you get new default config files, and also any config files that were removed in the new tree also vanish.

Unlike the rpm and dpkg model, OSTree is fully atomic. The /etc I’m talking about on upgrades is a separate copy from your running /etc. That allows you to do interactively run whatever scripts or programs you want to ensure the system is correctly set up for the next boot, with total safety.

Another cool feature enabled by the existence of /usr/etc is ostree admin config-diff. This is something I always wanted from Unix – it can give you, at any time, all of the files you modified in /etc. This makes it easier to replicate and reproduce system state after the fact. Your OS can have a “reset all configuration” button, and revert back to a pristine /etc at any time.

There are of course other systems out there you can layer on top of dpkg/rpm to improve the situation. etckeeper is a popular one. See also this proposal for /etc/defaults in Ubuntu. etckeeper suffers from being glued on after the fact. Furthermore, you don’t really need to keep a history of the upstream defaults in git – the main feature is having a 3 way merge. And /usr/etc is a simple way to implement it – and helps OSTree preserve the “feel” of the traditional Unix. You can just vi /etc/some/configfile.ini any time you want. The config state isn’t stored in a separate partition (e.g. CoreOS mounts theirs at /media/state). But OSTree still provides atomic upgrades as the many image-based upgrade systems out there do.

OSTree in action: “rpm-ostree” and switching trees

I’ve released rpm-ostree 2014.3. Up until now, the main public consumer of OSTree has been gnome-continuous, which has operated as a live research project in CI/CD. But I very intentionally from the start split out OSTree as a potentially sharable project with other build systems (such as dpkg/rpm).

Now that rpm-ostree is a functioning demo, it will help explain the vision and technology behind OSTree much better. Before I do that, let me talk about Continuous briefly. It has two trees that it ships, the refnames are gnome-continuous/buildmaster/x86_64-runtime and gnome-continuous/buildmaster/x86_64-devel-debug. I often just use the terms -runtime and -devel-debug for short since the other bits are usually implicit. -runtime is a basic install, comparable to what one might get from a default Debian or Fedora type live CD desktop install. -devel-debug is that plus *all* of the developer tools (gcc, automake), plus developer apps (glade, anjuta).

I suspect however that nearly all of the Continuous users (GNOME developers and testers) use -devel-debug. So-runtime is there basically just to demonstrate that we can have multiple trees.

With rpm-ostree though, the set of software in Fedora is much, much larger. And it can be assembled in many, many, many different ways. rpm-ostree only captures at the moment a very tiny subset of those; these refnames look like this:

fedostree/20/x86_64/base/minimal
fedostree/rawhide/x86_64/base/minimal

Note that the single OSTree repository contains both 20 and rawhide versions of the trees. Here are some more refname examples; I’ve ommitted “rawhide” from the list for brevity:

fedostree/20/x86_64/server/docker-io
fedostree/20/x86_64/server/freeipa-server
fedostree/20/x86_64/workstation/gnome/core
fedostree/20/x86_64/workstation/gnome/default

For the actual content of these, take a look at the current products.json which generates these trees. It might also be illuminating to look at some build logs from today.

As I’ve said in a few places, this “medium” level of flexibility puts OSTree between the ultimate flexibility of packages, and the “Here is our OS, enjoy!” model of ChromeOS. Up until now, in my discussions with people about OSTree, I can often see them mentally trying trying to slot OSTree into either the “yet another package system” box, or the “yet another image update system” box. But it’s certainly not a package system (rather it complements existing ones), and it’s definitely more flexible than Chromium Autoupdate (but less efficient in some ways). For more details, see OSTree/RelatedProjects.

Okay, so we see now that OSTree allows storing multiple trees (with history). I only very recently landed this commit to OSTree to make switching between trees easier on client machines. Let’s try it; say you’ve installed the “fedostree/20/x86_64/base/minimal” ref. Now you want to change to the tree containing @standard along with docker-io. With the latest OSTree, this is just

# ostree admin switch fedostree/20/x86_64/server/docker-io

Semantically, this is somewhat like doing yum install @standard docker-io – except the transition is fully atomic. Furthermore, we get an exact replication of the tree from the build server; if for example we were transitioning from say the fedostree/20/workstation/gnome/core tree, then we wouldn’t get a mix of GNOME packages and docker-io. Our running GNOME tree would be untouched, still on disk, still running – when we reboot, then we’d be in the new server/docker-io tree. If we then did from inside the booted docker-io tree:

# ostree admin switch fedostree/20/workstation/gnome/core

Nothing would be downloaded (unless the ref was updated), and we’d be back to GNOME (again, atomically). All along this process, our changes to /etc will have a 3 way merge applied, and our data in /var will be carried along. So like image-based updates, OSTree is reliable and safe. But it has a bit of the flexibility of packages – I think OSTree could scale easily to 100 or more trees (with history!) being shipped by a vendor such as Fedora.

And of course, I’m not suggesting removing the packages as raw material – you can still compute your own filesystem trees locally. It needs some work to be made efficient – I’ll talk about that in a future post.

Cancelling computation – GCancellable (or: SIGINT versus threads versus exceptions)

From previous entries, you may have noticed that I find “cross domain” issues interesting – like ones that span programming language runtimes and operating systems simultaneously. An excellent example of this is the generic concept of “cancelling” computation.

What do I mean by cancelling computation? Let’s first define “computation”. Most people using computers are interacting with an operating system, which runs a set of processes. Here, a process is an example of computation. But in more complex software, it often happens that a single process inside has multiple related independent “tasks” (these tasks could technically be structured as threads, or coroutines inside a single thread). It’s also in some domains quite common for an application to be composed of multiple processes, where child processes communicate back with the parent. The recent Linux addition of cgroups finally give a way for the operating system kernel to logically manage a group of multiple processes as one unit.

Now that we’ve looked at different kinds of “computation” – we can say “cancelling” that computation means that the computation stops before it would have ordinarily completed, and furthermore that only that computation stops. The latter part of this definition is important – I could of course cancel my browser process(s) by pulling the battery from my phone or halting the operating system kernel, but that’s not a really useful implementation of cancellation.

To make all of this less abstract, let’s talk about a simple case – you are logged into a Unix shell, and type “sleep 30”. While that’s running, your shell will be blocked. You can press Control-C, and this results in just that “sleep” process being “cancelled” by the definition above. Under the covers, Control-C results in a SIGINT being generated for that process, and because it didn’t install a handler for SIGINT, the default action defers to having the kernel exit the process.

If you’re using systemd for your init program, the operation “systemctl stop foo” will (by default) use the Linux kernel’s cgroup mechanism to stop all processes that were ever invoked from the original foo process.

GCancellable

The above two cases are pretty simple and reliable; they’re managed by the operating system kernel. But what about the case where a single process is composed of a set of logical “tasks”? This is where things get interesting, and where this blog post starts to relate to GNOME: we’re going to talk about the GCancellable.

If you’ve used modern GNOME APIs very much, you’ve almost certainly seen the GCancellable parameter on nearly every method that involves I/O, or more generally, an operation that one might want to cancel. Let’s look at a specific example, g_input_stream_read_async. Let’s further suppose that that GInputStream is the read half of a GSocketConnection – You are writing an application which reads data from the network, such as a Bittorrent client, a web browser, or a ssh client. You want to have a “[X] Stop” button, which just stops the data copying; it should not make your entire application’s UI disappear. Sounds obvious when stated this way, but the operating system kernel doesn’t help us here – userspace code needs to be involved.

On Linux kernel based systems, a GCancellable is really just a wrapper for an eventfd(), which is just an optimized version of a standard Unix pipe() for cases where you just want notification of events on a 64 bit counter, not a general byte stream.

Under the hood, when your application calls g_input_stream_read_async(stream, …, cancellable, …), GLib will ensure your application drops into a poll() both the socket fd and the eventfd, awaiting input. Calling g_cancellable_cancel will cleanly break your application out of the poll(), and manifest as a G_IO_ERROR_CANCELLED error.

I originally wanted to have a comparison with Java, but it would make this blog entry far too long. Suffice to say it’s been an evolutionary learning experience for them, moving away from threads for everything.

The GCancellable model has been very successful for GLib, and its consuming applications such as GNOME (both by language bindings and pure C/C++ applications) – the application programmer can write reliable cancellable asynchronous operations, and under the hood GLib makes efficient use of available operating system primitives.

Montréal summit result: GSubprocess

I always enjoy the annual GNOME (Boston|Montreal) Summit. It’s small enough but has enough core GNOME contributors show up that it can be a focused and productive event. I’m happy to have finally landed GSubprocess in GLib. The venerable g_spawn_async_with_pipes only gave you raw platform-specific file descriptors which are hard to work with. GSubprocess instead leverages the standard streams in GLib that provide nice asynchronous I/O.

Furthermore, spawning processes is something that’s easy to get wrong. In particular, constructing the argument vector via strings makes it very easy to end up with shell injection if some of the arguments like filenames end up being provided via a potentially hostile attacker. Here the venerable libc system() API and its clones in poorly engineered runtimes like PHP are at fault for making it easy to do the wrong thing.

Furthermore, system() is very inflexible; if you want to do something as simple as capture the output, you have to turn to other APIs like popen() which still share the flaws. This was only recently fixed in POSIX with the addition of posix_spawn.

Now, the Python subprocess module is an example of what I consider a well-designed API for spawning child processes. You’re encouraged to give the arguments as an array, not a string. It’s possible to redirect output with pipes, and its communicate() API makes it easy to both read/write and wait for process completion.

When dealing with nontrivial subprocess code, you quite typically need to handle multiple events. This is where Python is weak, and it gets to why I think GSubprocess is interesting; because it (like all the other classes in Gio) leverages GLib’s builtin mainloop, it’s not hard to do things like spawn multiple processes, handling input and output asynchronously, as well as handling whatever other events your application needs to process, such as terminal or X/Wayland input.

The intent of GSubprocess is to be a robust base on which higher level classes can be written, depending on application need. While most of the time it’s obviously better to use shared libraries, that’s not always possible. Having GSubprocess in GLib fills an obvious gap in the stack, and I’m glad we’ve finally landed it in GLib.

ostree v2013.6 released

I’ve been working on Free Software for a long time now, on many different projects over the years. In approximate order: Emacs, a variety of things in Debian (PowerPC port, GNOME packaging, GPG verification for apt, CDBS, SELinux porting), then Rhythmbox, then many years of Fedora and Red Hat Enterprise Linux, DBus maintenance, a side stint in web services, and of course my favorite project, GNOME. Which in turn includes working on a lot of infrastructure like systemd, accountsservice, gdm, polkit. A lot of these have been very fun to work on, and mostly useful contributions.

But I think the most valuable contribution to Free Software I will have made will be my most recent project, OSTree. I’ve just released version 2013.6.

The documentation provides an overview in more depth, but briefly here: it’s a tool for parallel installation and atomic upgrades
of general-purpose Linux-kernel based operating systems, and designed to integrate well with a systemd/GNU userspace. You can use it to safely upgrade client machines over plain HTTP, and longer term, underneath package systems (but above filesystem and block storage layer; use whatever you want there).

I have been obsessed with upgrades for a long time, on and off. (Side note: interesting that KSplice does now exist). How OS upgrades work affects everything in the system architecture, from the development process to end system reliablity. It’s a problem domain that spans client machines, cloud deployments, traditional servers, and embedded devices.

Atomic and safe upgrades

Over the past few years, my interest in the domain rekindled for several reasons. One is that I happened to stumble across NixOS. While obviously there are a lot of software deployment mechanisms and such are out there, if you filter by “has atomic and safe upgrades”, the list becomes quite small and most of what’s there is quite specialized. So I studied Nix carefully; the executive summary is that while they have cool ideas, rebuilding or redownloading the entire system for a glibc security update is deeply impractical. But their approach of having a symlink farm which is the target of a bootloader entry stuck in my mind.

Also in the last few years, Chromium OS appeared with its autoupdater. The Chromium OS updater is an extremely efficient design…for their use case. But it’s hard to generalize the design to the wider world; doubling the disk space usage in every cloud image is a rather large penalty. Furthermore, the Chromium OS model doesn’t have much of a story for locally generated systems. If you want to customize the OS, you are completely unable to reuse their updates server, as it is all about deltas between fixed disk images. Again, this all makes sense if your model is that the only apps are web apps, but that’s a very fixed use case.

In short, OSTree is more efficient than Nix in a number of ways, and most importantly only handles filesystem trees; it’s not a package system. I posted plans for approaching the efficiency of the Chromium OS updater on the wire.

But, if OSTree is so cool, why isn’t it powering your package system? The simple answer is because it’s really quite deeply invasive for existing package systems like dpkg/rpm and all the others that are basically just clones of the same idea with different names. This quote from a LWN commenter sums it up:

What OSTree is proposing is somewhat unclear, but appears to require rebooting on *every single package upgrade* so as to switch into a new chroot containing that package. That means several times a day for me. Not a bloody chance am I letting something like *that* near any of my systems outside a VM: it is transparently ridiculous and optimizing for a very few programs that might need to take extra measures to avoid being broken by updates happening underneath them…

Right. On the plus side, you get atomic upgrades, and this is a tradeoff that a substantial number of people would likely take. Ultimately of course, as I replied to the commenter, it’s certainly possible to imagine carefully engineering the OS so that a certain subset of changes can be “live applied”, while still preserving atomic upgrades. Furthermore, while OSTree does not come with or force any particular independent application installation mechanism, it is designed to provide a fundamental layer for existing and new ones.

Parallel installation, the OS development process, and system quality

The core of OSTree is so simple – it’s just booting into hardlinked chroots – that it was relatively easy to enable something else besides atomic upgrades, which is easy parallel installation of operating systems. Not only does it make it easy to dual boot say a stable OS and the bleeding edge, if you have the disk space, you can thousand-boot, or more.

Why is this so critical? It’s because while package systems have a lot of flexibility, there’s one extremely important gap: The ability to try new code, and go back if it doesn’t work. This was covered in my original GUADEC presentation. Typically package-based distributions manage this by creating several different layers. Debian has stable, testing, unstable, and experimental. But if you upgrade from stable to unstable to see if suspend works for example, the package system will fight you trying to downgrade; the concept of “newer is better” is baked deeply into dpkg/rpm and everything built on top.

Being a software engineer working on an extremely complex general-purpose system like GNOME without massive development resources, let me tell you – it’s easy to break things unintentionally. Having a subset of users (but not everyone) that run the bleeding edge, like Firefox has in Nightly would be a real benefit, while also giving them a mechanism to fall back to the previous working build. And in fact, I have a separate project gnome-ostree that’s intended to be exactly that. Although it has an uninspired name, it’s better than “nightly” – it’s fully continuous, updated easily 70 times a day as git commits are made. But while it serves as an important testing base for validating the core OSTree designs in a relatively constrained scenario, it’s a separate project, and not the topic of this blog post.

OSTree underneath package systems

There are a large number of systems which fit into the model of efficiently replicating pre-constructed OS trees from a build server; many basic “client” workloads as well as cloud deployments are best delivered this way. That said, the “package” model where filesystem trees are computed dynamically on individual machines is very flexible, and some of that flexibility is entirely valid. Particularly for organizations which have invested heavily in it, it doesn’t make sense to toss out that investment; I want to support it.

While I’ve been relatively quiet about OSTree so far, I think it’s finally reached a point in implementation quality and design where I’d like to see more package system maintainers and distributions attempt to experiment with it; that’s the goal of this blog post. A quick weekend hack a while ago resulted in fedora-ostree. Since then, I worked on it a bit more this weekend, and updated it.

This is a long term effort; as the LWN commenter above said, OSTree has wildly different tradeoffs from existing package system semantics. There is a new section of the OSTree manual describing changes that many existing general-purpose distributions will have to make to adapt.

And clearly, hashing out a design where some changes can be applied live (after they are atomically set up for the next boot) would be really nice. If you’re logged into a system and want to zypper/yum/apt-get install strace, there’s no reason since that’s just a new file in /usr/bin that we can’t just make it appear right away. But as you go up from there in complexity, it gets more difficult to do without race conditions. But luckily, we have the complete source code to the operating system; and starting from a fundamental basis of reliability and safety, it is much easier to add features like speed and flexibility.

If you too share my passion for atomic upgrades, operating system upgrade engineering, continuous integration and such, then check out the git repository and join the mailing list; it’s a great time to join the project, as there are several new contributors, and it’s just fun to work on!

GNOME DevX, FOSDEM

Back now from the GNOME Developer Experience hackfest and FOSDEM. I was a bit late arriving, and entered during the perennial Python/JavaScript/vala/etc. discussion.

Developer Experience

Languages

Let me write down my current thoughts on this – they’re a bit nuanced. First, this previous entry still stands. The big picture of the architecture is correct, where a bindable subset of C is the primary interface definition. Another way to say this is that there are no plans to, for example, change GTK+ to require JavaScript.

C/GObject is just the interface definition – it’s possible to write GObject libraries in C++. For example: pango is now partially a frontend around harfbuzz. What’s not possible and will never be sanely is some sort of mechanism where one language calls into a component in another.

Let me highlight the excellent work of Jasper St. Pierre on gobject-introspection‘s documentation generation over the hackfest. We’re getting quite close to an initially usable version; this is something people have wanted for a long time.

Applications

This was an interesting discussion – lots of ideas going around. Some of it is quite very far from realization (like a complete sandbox), so the key I think is going to break it down into individidually useful steps that can be iterated on. There’s also a lot of prior art here as well, but the glick2 page explains why it’s not about just bundles – they need to integrate with the system. For example, we still want applications to use GSettings, and system administrators should be able to have a global view of application configuration, with mandatory controls, etc.

Thanks to:


BetaGroup Coworking for providing space, and:

GNOME foundation and Red Hat for sponsoring travel!

FOSDEM

Awesome, overwhelming, informative, and fun. Lots of great discussions with a wide variety of people across the FOSS world. Really looking forward to next year.

Building everything from source vs self-hosting

In this post, I’m going to answer a seemingly simple question: Why do neither Debian or Fedora have a well defined and reliable process to rebuild everything from source?

I’m often surprised by how many people I encounter in the FOSS world, even experienced developers, that are either intimidated by the idea of building “everything” from source, or think it’s crazy, or just not worth it. Let’s just assume for the purposes of this discussion that rebuilding from source is valuable. I mean, after all it’s Free Software, not Free Binaries Wrapped With Some Metadata.

First, let me define “everything”: the goal here is to construct a basic Linux-based system, bootable in qemu, and you can log in as root. A perfect example of our goal here is to build the source code that comprises JS/Linux. That means the kernel, bash, glibc, gcc, etc. If you read the tech notes, you’ll see it’s built using Buildroot.

The first thing to observe here is not only do multiple projects to accomplish this goal exist, they do so with a high degree of reliably, and solve real-world needs. For example, the Yocto project’s “core-image-minimal” target gets you basically this same thing, and you just run bitbake core-image-minimal, and everything else is done for you. Likewise, a quick read of the Buildroot manual will show you just how little needs configuration or manual intervention.

The second thing to note about these systems like Buildroot, Yocto, and others – they are not (by default), self hosting. The host and target systems need not be the same. For example, you can use Yocto to build “core-image-minimal” from a Red Hat Enterprise Linux 6 system, an Ubuntu 12.04 system, and a variety of others. In fact, you can even do full cross builds from x86_64 to ARM. Now, interestingly Yocto can generate self-hosting systems, but it’s not the default.

We’re getting closer to answering our original question. Let’s further observe that both Debian and Fedora are defined to be self hosting systems. Why is self hosting a problem? It’s because of circular build dependencies. The classic example of this is gcc, which is written in the C programming language. In order to build it, you need a C compiler already. The Yocto/Buildroot type build systems get out of this problem in a simple way – they assume you already have a functioning gcc on the host system.

But in Debian and Fedora, in order to build the gcc package, you need gcc already built as a package – the build system won’t accept just having a “gcc” binary in the $PATH. That’s how the build systems work because again, that’s how the projects are defined.

If you haven’t done this recently, grab a mirror you can hold in your hand, and go into your bathroom, and point the hand mirror at the wall mirror. You’ll get an infinite recursion. It’s really quite beautiful and fun to do, but since I’m sure many of you won’t, there’s a good picture here.

This infinite recursion resulting from self-hosting is the reason there isn’t one reliable command to rebuild all of Debian or Fedora from source.

One question you might have – would it make sense to have a well-defined process for bootstrapping a self-hosting system like Debian? Some of the developers think so, and the DebianBootstrap wiki page describes the thoughts so far. Personally though, I think it’s both too complex and too vague. A much simpler, and ultimately more reliable goal would be to ensure that version N of the system can be built by N-1. So Fedora 17 can be built on a Fedora 16 system, Debian Wheezy can be built from Squeeze, Red Hat Enterprise Linux 6 can be built from Red Hat Entrerprise Linux 5, etc. Eventually this is a goal I’d like to achieve for Red Hat Enterprise Linux at least. There’d be some cost to packages with circular build dependencies, but having a well defined, reliable process for building from source: priceless.

GNOME Summit 2012 – Friday & Saturday

Friday: Newcomers pre-event

The Newcomers pre-event on Friday evening was very successful! A number of students came from nearby universities like Tufts and MIT, and were able to dive straight in to tools for GNOME development like Git, Jhbuild, and of course GTK+ and Gjs.

You can see the bugs they reported (with patches!) in bugzilla. Not everyone reported a bug – creating an account was a hurdle. Still though, the event was a success and likely something to replicate in the future.

Saturday opening

Saturday morning started with coffee and bagels – thanks to SUSE for sponsoring! As the summit is an “un-conference”, we made up a schedule on the fly, which you can see here.

GTK+

Benjamin Otte demoed some new CSS features that are implemented in GTK+ – notably a use of animated transition for background of active menu items. He also started a discussion on the challenges in maintaining the 750,000 lines of code in GTK+. We brainstormed about how we can attract more contributors to infrastructure such as GTK+.

Owen Taylor repeated his demo of client-compositor synchronization, covered by LWN previously.

GObject Introspection

Colin Walters discussed the state of introspection and the goals for the 3.7 development cycle – in particular, generating documentation from GIR files is the primary goal. There was an obligatory debate about which programming languages to use/promote in GNOME, along with some questions about how we can improve “bindability” of APIs in GNOME.

GNOME OS

This was a followup from GUADEC, and ended up taking up several hours. It opened with a status update on OSTree, also on LWN previously. The system has been doing successful continuous integration on over 200 git repositories, and notably all of GNOME up to gnome-shell, plus some application dependencies like gtksourceview.

While the current OSTree builder is automated, there was a lot of interest around making it better, and ensuring that when the build does break, that both the responsible party knows about it, as well as other people. This led into some comparisons with the WebKitGtk+ development process. Concrete action items resulting from this were an IRC bot and an improved web page.

There was general consensus that the composition of the modulesets is best handled by the release-team. A lot of time was spent on the topic of Jhbuild, and its strengths and weaknesses for application authors, GTK+ developers, and core OS hackers. Ryan Lortie described how we could make jhbuild better by having GNOME builders provide distribution-specific binaries, and later in the day could be seen furiously hacking on implementing it.

After lunch, the topic of applications came up. Allan discussed application stores, and how that can provide a good experience around things like centralized updates, and generating revenue for authors.

Following application stores, the topic of application sandboxes came up. Chris Ball mentioned difficulties OLPC had encountered in getting application authors to adapt to their sandboxing scheme. Strengths and weaknesses of the Android model came up, and at this
point we agreed to break out sandboxing as a standalone topic for the next day.

Developer toolset

Benjamin Otte started a discussion about our developer tools, and we talked about different kinds of application authors:

  • ourselves (GNOME Documents, Rhythmbox, etc.)
  • iphone/android type app
  • libreoffice-size application
  • enterprise apps (e.g. proprietary creative apps)
  • sysadmin apps

There are challenges in meeting the needs of all of these different
kinds of authors.

A11Y

Piñeiro started with a summary of the state of accessibility. 3.6 was a significant achievement in that a11y is always-on by default, and only impacts the system when an accessibilty tool is running.

He then talked about 3.7 features, such as improving the mangifier. There was also some plans to improve configurability, and improve discoverability of existing options.

The ongoing cost of maintaining fallback mode was mentioned briefly. Then he talked about allowing third parties to extend accessibility support when using custom widgets in GTK+.

Finally, two other points that were touched on were that some new applications (Documents, Clocks) have inaccessible features. Second, touch accessibility (for blind users).

3.7 planning

While there are a few features listed for GNOME 3.7 development, pretty much the entire hour of discussion was on fallback mode. As the feature page says, dropping fallback mode would allow us to significantly clean up some GNOME internals, it turned out on the other hand that Ubuntu Unity is likely relying on some of those as well.

The fallback mode users fall into two general categories; those who wanted a more GNOME 2 like experience, and those who were unable to run GNOME 3.

In general, the discussion seemed to follow along with what the feature page had already, and opinions varied. No hard decisions were made now; this will be an ongoing discussion.

GNOME Boston Summit, plus: why hacking on GNOME is fun!

The Boston Summit is announced! I’m looking forward to it; there was a lot of positive stuff at GUADEC, and more should happen here in Boston.

Announcement done, one thing I want to mention is why I find working on GNOME fun – there are actually a lot of hard challenges that arise in working on client-side operating system code, particularly around user interfaces. For example, this bug involves the intersection of X11, multithreading, garbage collection, how GC is different between CPython and Spidermonkey, and the cross-platform nature of GTK+. It’s really not an easy problem; there are difficult tradeoffs to be made between complexity and speed in different components. But solving these kinds of difficult issues is what I find rewarding as an engineer. And there are certainly plenty more to solve in the GNOME context!

On asynchronous/event-driven programming, and why it lies at the heart of GTK+ (and thus GNOME)

Von Neumann was missing some hardware

When I was in college, we never learned about event loops (we also weren’t really taught revision control formally, which is even more dire, but that’s another story). My early introduction to programming was all basically sequential. Taking courses on processor/memory architecture and assembler at the same time, at some point there was an epiphany when I realized it was all incredibly simple – the code I write gets compiled into machine code that the processor executes, modifying memory and jumping around, and there are some special calls to talk to devices. My feeling was everything else was just sugar on top of the fundamental Von Neumann architecture.

It was only when I really decided to get into GNOME that I was introduced (indirectly via GTK+) to event-driven programming. Now, all of a sudden, my program interacts with other programs, and all sorts of things can happen in any order. More than that, you really have to understand the representation of both time and how operating system schedulers work to make sense of it (down to the hardware). While of course there was always an operating system underneath, when and how exactly my program was scheduled was irrelevant, because it was entirely linear.

The concept of time alone is actually really complex – take the difference between monotonic time versus the wall clock. What’s more, there has to be something in hardware to implement this. Well, OK, people did write code that assumed a fixed frequency of the CPU, and this resulted in Turbo buttons, a fun bit of computing history. But the point is that the simplistic Von Neumann architecture wasn’t actually a useful mental model anymore.

GTK+

The reason GTK+ programming requires an event loop is because you need to keep drawing to the screen, reacting to user events, even if your app is doing something else (most typically blocking on I/O, more rarely you’re CPU bound). Owen’s talk today at GUADEC was a great reminder of the amount of complexity and coordination involved (It was also a cool talk!).While I think originally the event loop was part of GTK+, it today lives in GLib.

My message here to people I’ve talked to at GUADEC who are just learning GNOME programming is to understand that this bit is the fundamental piece upon which everything else depends. The second most important bit is the big bag of handy pre-written widgets that live in GTK+; but you could imagine writing an app without that, tedious as it might be. And what’s important about the main loop is it doesn’t really work unless everything in your program/process shares the same one. Getting access to the main loop (and the bag of widgets) is the reason why gobject-introspection exists; it’s why you have to learn new ways of doing things instead of just taking “regular” Python, JavaScript, or whatever examples you might find from typical sequential programs that is probably still the most common type of software.

Asynchronously deleting a directory

So I want to give a specific example of how it’s very interesting to use GLib’s extensive asynchronous infrastructure for a fairly common task – recursively deleting a folder. I’ve pushed some example code here – there’s a version written in Gjs, and one in C. One quick note – I actually just wrote a GLib patch necessary for the example. So…use git =)

If you look at the code, it certainly looks very twisted, bouncing around with state. The code doesn’t execute top to bottom (like a sequential version would); rather mostly the reverse. What’s the advantage of all of this pain? Well, let’s say we want to print progress once a second. This is actually quite nontrivial to do in a sequential program. Let me give you a real world example – git (git the actual program itself). I’m not going to explain the drawbacks of setitimer here; what I do want to show is just how easy it is to do on top of the GLib main loop. Here’s the commit. And if you wanted to do more things at once, such as query for user input on files which are write-protected, that can still happen while other files are being deleted.

Faster?

One very interesting question I had when I was writing this was – would it actually be faster than the venerable GNU Coreutils, which is just a synchronous program? Concretely, when it calls the POSIX unlink(2) call – the whole program is blocked. But if we give the kernel more work to do at one time, it can often make smarter scheduling decisions. This turns out to not be the case (at least on my laptop). Looking through perf record, it looks like all the threads are getting tangled up in various VFS locks, which is actually not at all surprising – it’s just not optimized for multiple threads deleting files from a directory while it’s also being traversed. I also have a suspicion that the default CFQ scheduling may be optimized for the common Unix-utility style synchronous serial I/O over the “random” I/O patterns that asynchronous programming generates.

Conclusion

Event driven programming is the most fundamental part of writing any kind of GUI program, and it’s also very effective for many other programming domains too; nodejs.org seems to be the currently most widely talked about system that has this same style, but there have been many in the past too. Hopefully this post helped explain how some of the fundamental parts of the GNOME/GTK+ stack fit into wider technological picture.