finding algorithms

Dear lazyweb, I recently found myself having a set of Unicode strings (in main memory) and the desire to search all of them for a particular prefix (not substring). For bonus points, let’s say I want to do it in parallel (seems like an easily parallelizable problem, although at worst you can divide-and-conquer for a given number N of processors). The constraints are that the set of strings can change, but not very often usually. I don’t want too large of a hit to compute tables and the like though.

The real question is – how do you find papers (and implementations thereof) for computer science problems like this? My first instinct was to get an ACM subscription, which I did. Their search turns out to be mediocre, and then I discovered Wikipedia has useful pages on these kinds of things. Wikipedia also has the winning feature of linking to quality free software implementations thereof.

On this particular problem, a lot of the literature is around substring matching which isn’t what I want, at least not directly. Still looking.

Advertisements

Dear GObject/C library authors

Get ready – it’s coming. At the GNOME Summit this weekend we’re having a hackfest on introspection. As a programmer working on a GObject library, here’s what you need to understand:

Using gtk-doc annotations

Just parsing the C code isn’t enough – we sometimes need additional metadata. Here’s an example:

/**
* gtk_list_store_set_column_types: 
* @store: a #GtkListStore
* @n_columns:
* @types: <array,length=n_columns>: List of types
*/
void
gtk_list_store_set_column_types (GtkListStore *list_store,
gint          n_columns,
GType        *types);

There are more examples here. This syntax hasn’t really seen much bikeshedding yet, and to be honest it’s grown rather organically. So things could change. But the goal is for the annotations to be concise, fit in with gtk-doc, and support a variety of attributes; we don’t want to require a totally different IDL format.

Add your library to gir-repository and check it

The current thought on deploying introspection is to have two phases. In the first “unstable” phase, we will also have two separate modules: gobject-introspection and gir-repository. It’s unclear yet whether the former will remain separate from glib or not in the long term, but expect it to exist separately for now. For gir-repository, the long term plan is to integrate everything into the relevant modules themselves, but for the next 6 months or so while we’re finalizing bits, we will keep annotations and code to invoke the scanner inside this centralized module.

So for now, you can add your library (if it isn’t there already) to gir-repository by filing a patch in Bugzilla (glib/introspection), or pastebin on irc.gimp.org:#introspection. After you have your library successfully parsed by the scanner, the first task is to inspect the generated .gir file. This XML file describes the parsed API. Try searching in particular for any APIs you think are tricky.

Try out your library in a binding

After inspecting the .gir file, you can try out your API from one of the bindings such as PyBank or JGIR, with more coming. We expect to do some PyBank hacking in particular at the summit, and hopefully get the Spidermonkey and Mono bindings further along. There is a lot to do for all the bindings, but I hope that introspection can be a strong focus point for collaboration between the bindings.

Writing bindable APIs

I’ve started a wiki page about how you can keep your C code easily bindable. More content appreciated! This page will likely be a bit fluid as we determine both what is supported in the introspection core, and how its concepts map to languages.

That’s it for now – remember if you’re in the Boston to come to the summit!