FUDCon Toronto, on the bus

At FUDCon Toronto 2009, finally got a reliable connection at the hack room to give an update. On the bus from Westford to Toronto, I spent some time optimizing the GNOME Shell JavaScript engine Gjs, specifically how we do invocations from JavaScript into our C core.

As always, sysprof is an a great tool, and revealed we had some relatively low-hanging fruit. (Side note – in Fedora 12, sysprof no longer requires an external kernel module, which is awesome). Said fruit was partly in Gjs, partly in the GObject Introspection layer. We could clearly be doing more caching in the latter, but Gjs was also using the basically toy/demo invocation function g_function_info_invoke.

I’ve now got it so the invoker can take a bunch of JavaScript jsval pointers, and schlep them more directly into libffi argument types. The end result is currently a 25% speedup on my benchmark, which is fairly good as far as 5 hours in performance work goes. I’m now looking at some other bits, and I think I’ll be able to squeeze out another 7-10% at least with just another hour or two of work.

Now, in the the farther future, I could imagine teaching SpiderMonkey‘s JIT compiler about how to invoke directly from JS to C (and the reverse); the amount of generic work we still do to take say a single double across is fairly high, but the JIT could know about the platform ABI, and be able to trace right up until a native method, which would be a large speedup.

And for now, I’ll leave you with a picture of a tasty waffle from this morning:

From FUDCon Toronto 2009

Netbook in the coal mine

For a long time my primary workstation was a (now 3 year old) Dell Inspiron E1705, or “bricktop” as ajax liked to call it. However the hinge broke a while ago when I closed my car door while the bag was too close, and when the laptop started to fall apart I decided to get a new machine. Actually, I got two. The first thing I did was get a serious desktop computer, a Dell XPS 630. It’s quite a beast, especially since I got 8GB of RAM with it. I pretty typically have over 2GB of memory used purely for page cache, which is probably my entire working set of files. It’s hooked up to a nice 24″ Dell 1920×1200 monitor. This computer is named megatron.

Needless to say, the machine is fast and nice to use. However, it’s rather unrepresentative of the general personal computing space. So I wanted another machine that was mobile, and also closer to what “most” people use. So I picked up a MSI Wind U123, known as pocket. It has just 1GB of RAM, which is probably the realistic minimum we want to support going forward (ideally we’d be workable with 512MB, and we probably are if you’re just running one application or one or two websites, but…). The machine also has an Intel 945 video, which is also near the lowest end graphics card we’ll be supporting for GNOME Shell.

The difference between the two computers is rather extreme, but I use the netbook very often, and working on it has forced me to optimize some things in the shell. It serves a similar role for performance issues that canaries used to do in coal mines (hence the title of this post). Specifically, I’ve been working on search performance, with fairly good results. We had a few sillies in the old search system like creating lots and lots of ClutterActors we’d never display, not caching lowercased strings, etc. (the other goal by the way of my search work is to move us closer to the current search mockup, which should be cool).

But the biggest performance problem I’ve been running into so far is synchronous I/O. A lot of GNOME libraries like gnome-menus were designed for gnome-panel, which typically you don’t interact with often, and so it’s not a big deal if the process is blocked. However if the shell is blocked, because we’re acting as a compositor, we won’t repaint the desktop or process input, which is a serious user experience problem. An example of this was that we would synchronously stat (check for existence) of recent documents; this could easily take several hundred milliseconds if the data isn’t currently in the kernel cache.

I’ve been reworking our docs handling specifically to be smarter; we use async I/O now, and only stat ones we’re actually going to show in the UI. We could be smarter still, but this has helped a lot. GIO has really nice APIs for this kind of thing.

I mention this for two reasons; first, just because you might be interested in a status update on Shell work. The second is because I’m hoping to Real Soon Now land my extension system patch, so if you create an extension using it, for a good user experience, you’ll need to be very careful about I/O too =)