On asynchronous/event-driven programming, and why it lies at the heart of GTK+ (and thus GNOME)

Von Neumann was missing some hardware

When I was in college, we never learned about event loops (we also weren’t really taught revision control formally, which is even more dire, but that’s another story). My early introduction to programming was all basically sequential. Taking courses on processor/memory architecture and assembler at the same time, at some point there was an epiphany when I realized it was all incredibly simple – the code I write gets compiled into machine code that the processor executes, modifying memory and jumping around, and there are some special calls to talk to devices. My feeling was everything else was just sugar on top of the fundamental Von Neumann architecture.

It was only when I really decided to get into GNOME that I was introduced (indirectly via GTK+) to event-driven programming. Now, all of a sudden, my program interacts with other programs, and all sorts of things can happen in any order. More than that, you really have to understand the representation of both time and how operating system schedulers work to make sense of it (down to the hardware). While of course there was always an operating system underneath, when and how exactly my program was scheduled was irrelevant, because it was entirely linear.

The concept of time alone is actually really complex – take the difference between monotonic time versus the wall clock. What’s more, there has to be something in hardware to implement this. Well, OK, people did write code that assumed a fixed frequency of the CPU, and this resulted in Turbo buttons, a fun bit of computing history. But the point is that the simplistic Von Neumann architecture wasn’t actually a useful mental model anymore.

GTK+

The reason GTK+ programming requires an event loop is because you need to keep drawing to the screen, reacting to user events, even if your app is doing something else (most typically blocking on I/O, more rarely you’re CPU bound). Owen’s talk today at GUADEC was a great reminder of the amount of complexity and coordination involved (It was also a cool talk!).While I think originally the event loop was part of GTK+, it today lives in GLib.

My message here to people I’ve talked to at GUADEC who are just learning GNOME programming is to understand that this bit is the fundamental piece upon which everything else depends. The second most important bit is the big bag of handy pre-written widgets that live in GTK+; but you could imagine writing an app without that, tedious as it might be. And what’s important about the main loop is it doesn’t really work unless everything in your program/process shares the same one. Getting access to the main loop (and the bag of widgets) is the reason why gobject-introspection exists; it’s why you have to learn new ways of doing things instead of just taking “regular” Python, JavaScript, or whatever examples you might find from typical sequential programs that is probably still the most common type of software.

Asynchronously deleting a directory

So I want to give a specific example of how it’s very interesting to use GLib’s extensive asynchronous infrastructure for a fairly common task – recursively deleting a folder. I’ve pushed some example code here – there’s a version written in Gjs, and one in C. One quick note – I actually just wrote a GLib patch necessary for the example. So…use git =)

If you look at the code, it certainly looks very twisted, bouncing around with state. The code doesn’t execute top to bottom (like a sequential version would); rather mostly the reverse. What’s the advantage of all of this pain? Well, let’s say we want to print progress once a second. This is actually quite nontrivial to do in a sequential program. Let me give you a real world example – git (git the actual program itself). I’m not going to explain the drawbacks of setitimer here; what I do want to show is just how easy it is to do on top of the GLib main loop. Here’s the commit. And if you wanted to do more things at once, such as query for user input on files which are write-protected, that can still happen while other files are being deleted.

Faster?

One very interesting question I had when I was writing this was – would it actually be faster than the venerable GNU Coreutils, which is just a synchronous program? Concretely, when it calls the POSIX unlink(2) call – the whole program is blocked. But if we give the kernel more work to do at one time, it can often make smarter scheduling decisions. This turns out to not be the case (at least on my laptop). Looking through perf record, it looks like all the threads are getting tangled up in various VFS locks, which is actually not at all surprising – it’s just not optimized for multiple threads deleting files from a directory while it’s also being traversed. I also have a suspicion that the default CFQ scheduling may be optimized for the common Unix-utility style synchronous serial I/O over the “random” I/O patterns that asynchronous programming generates.

Conclusion

Event driven programming is the most fundamental part of writing any kind of GUI program, and it’s also very effective for many other programming domains too; nodejs.org seems to be the currently most widely talked about system that has this same style, but there have been many in the past too. Hopefully this post helped explain how some of the fundamental parts of the GNOME/GTK+ stack fit into wider technological picture.

9 comments

Benjamin Otte says:|

July 29, 2012 at 12:33 am

Just for clarification: It’s slower because of VFS locks in the kernel or inside gio?

- Colin Walters says:|
  
  July 29, 2012 at 6:09 am
  
  The kernel. Note I set GIO_USE_VFS=local, because otherwise the threads all spend most of their time blocked on sending file delete notifications over DBus.
  
M Welinder says:|

July 29, 2012 at 1:32 am

I have a program somewhere that uses massive
parallelism to stat(2) all files in a subdirectory, i.e., the
same as yours with s/unlink/stat/. I thought
the kernel could safe tons of time by sweeping over
the disk in one direction at a time. Or something.

It brought the machine to its knees.

This was ~5 years ago, but I doubt things have change.
In part because I never followed it up with a proper bug
report. My bad.

Jasper St. Pierre says:|

July 29, 2012 at 1:34 am

“If you look at the code, it certainly looks very twisted”

oh ho ho

liam says:|

July 29, 2012 at 9:12 am

CommonJS has a proposal for a feature callled Promises which should provide help for devs used to writing sequential code. Of course, you could also go with any of the many functional languages (like OCaml!), but how can you pick just one?
Regarding your question about cfq, assuming you are right, and it doesn’t get patched, should glib query sysfs about which elevator is in use?

thomasvs says:|

July 29, 2012 at 10:53 am

For me, back in the day, realizing that for real programs interacting with people or other systems, you either have to go async or threaded, was a big step too. You articulated it really well in this article. Realizing that even a simple stat call can block your program completely was one step for me. (it’s why we ended up pushing GStreamer into a design that used more threads to handle I/O).

Funnily enough, I realized this more through my use of Twisted than through GTK. Twisted makes it explicit from the very beginning; I always felt GTK/GLib tried to keep the details away from you on how that whole main loop thing worked. Yes, of course it was callback-based, but it wasn’t necessarily ever clear from the docs why – I guess I just assumed that’s how everyone had always done things.

So was that an intended pun, twisted ? 🙂

Henri Bergius says:|

July 29, 2012 at 6:22 pm

There are several approaches at managing asynchronous code in the Node.js community, from promises to the async library (https://github.com/caolan/async/) and flow-based programming (http://noflojs.org/).

Given that Node.js can no run GObject libraries via node-gir (http://bergie.iki.fi/blog/node-gir/), it might be a possibility to try how some of these fit into the GNOME world.

l says:|

July 30, 2012 at 8:54 am

Your code is ugly… Write logic not mechanics.

http://jeditoolkit.com/2012/04/26/code-logic-not-mechanics.html

You don’t neccesarily have to do it in this way, but you should in some way achieve:

A for loop with recursion in to sub dirs, that dispatches the events needed for updating the gui.

Anything else is just ugly, and will lead to spaghetti code.

Rich says:|

July 30, 2012 at 9:40 pm

What you need is control inversion and/or co-routines. Write straightforward linear code and have the compiler or runtime turn it into the complex event-driven stuff.