Polyglot programming

As you may have gathered from previous entries, I’ve recently become interested in programming languages again.
I’m almost done (though skimming some parts) with my copy of the aforementioned Ruby book. and though I haven’t done any substantial Ruby programming (just playing around in jirb while reading), I think I have a good idea now why so many people love the language.

Closures

Most of my real code over the last number of years has been in C, Java, and Python, and I know those languages and their runtime libraries pretty well, but reading the Ruby book I was struck by how really useful closures can be in an Algol-family language (i.e. not Lisp). Well, Ruby calls them “blocks” and has infrastructure on top in the form of yield, etc., but that’s fundamentally what they are. C/Java/Python all lack them (no, Python’s single-line lambdas are too restrictive to count).
Closures…environment, get it?

Closures are incredibly powerful, in fact you might say they’re the ultimate language construct. Neal Gafter has a good description of the kinds of things you can do with them, going from a language that doesn’t currently have them.

Polyglot programming

While I was thinking about this blog entry, I reread one of Steve Yegge’s great blog posts, and decided to look up more about his reference to the author of a design pattern book “leaving Java to go to Ruby”. After going to Martin’s home page, I found on his wiki he has a good entry which is pretty close to what I wanted to talk about. That is basically: it makes sense for large software systems to have multiple layers in different languages.

Now, if you’re thinking “Wow, that’s obvious”, that’s good; but there is more to the story here. So let’s look at some rationales. If your app is large enough, you probably have parts which need to be fast. And you probably have other parts which cry out for a domain-specific language.

So for speed, you’ll want a lower layer which is usually characterized by manifest typing and direct vtable function dispatch. Read: C++/Java/C#.

But it makes sense often to have a higher layer which is agile. It is better for the parts of your program which change rapidly – this could be user interface bits you’re prototyping, or rapidly creating test cases. This layer is usually characterized by implicit typing (possibly with type inference), metaprogramming capabilities, and (ideally) good integration with the lower level language. Read: Groovy,JavaScript,Python,Ruby

There is a lot of software out there split in exactly this way; in fact, you’re almost certainly reading this blog entry in one of them, Firefox, where the answer is C++ and JavaScript. A lot of computer games are built in this way too – for Civilization 4, the answer is C++ and Python, and for World of Warcraft it’s C++ and Lua. If you’re familiar with Java, just think about JSP and Ant – they’re really DSLs. If you mostly know Python or Ruby, think about how much of the underlying platform is actually written in C/Java/.NET.

So it’s fairly easy to dismiss anyone who says something like “everything must be written in language X”, for values of X like C,Ruby,Python,Java. Which reminds me to say: Eclipse really needs to embrace Eclipse Monkey.

The impedance mismatch


So we accept that it makes sense to have multiple languages with different characteristics. One important issue then becomes – how similar are our two different layers? Taking the example of C++ and Python as in Civilization 4. The gap is enormous. C++ containers are not the same as Python containers. C++ strings are not the same as Python strings. C++ objects and Python objects are wildly different. The answer to this problem is to create a special glue layer; in Mozilla, it’s called XPCOM. In GNOME, it’s called pygobject. These layers are very painful to create and maintain.

An interesting question is – what if our two languages shared more? Do we really need to have separate container types just to get agility and dynamism? The answer turns out to be – no, which we’ll get to in a minute. As we know, the fact that there are a lot of things that every modern language shares lower level components (like garbage collection, JIT compilation) led Microsoft to brand .NET as a multi-language runtime (as an aside, plenty of languages ran on the JVM long before .NET was created; for example Kawa, which dates to 1996). Now here’s the thing, though. Running on .NET does not make Python objects same as .NET objects, nor does it make their containers the same.

What is an object?

Let’s briefly take a look at what a Python object is. It’s a fairly illustrative example of just how different languages can be. In Python, every object instance is by default a dictionary (hash table), with data stored in the __dict__ member. Every property lookup or method call has to in general traverse a chain of hash table lookups. At any point, some other code can come along and add a new entry in an object’s dictionary:

#!/usr/bin/python

class Test(object):
def __init__(self, a):
self.a = a

t = Test("hello")
t.b = 42
print t.b

Supporting this level of dynamism is expensive, both in time and space, again because every object instance carries along a mutable hash table under the covers. It means you can’t share very much between processes. It makes multi-threading much slower because everything has to be synchronized on that dictionary. Besides being expensive, it’s almost never what you actually want, at least by default. You usually want t.b to be an error. This is by far my biggest issue with Python. In fairness to Python, it predates almost every other language discussed here.

Stealing and language evolution

Languages are clearly stealing things from each other, and evolving together. In the Ruby book they often mention how certain parts were taken from other languages. Java and C# are stealing ideas from each other. ECMAScript 4 is clearly rebuilding itself on a more JVM/.NET like class model.

What I’ve been looking at lately is a new dynamic language that has clearly stolen a lot of the good ideas from Ruby and Python, but is a lot more “native” to a modern runtime (in this case, the JVM): Groovy.

#!/usr/bin/env groovy

class Test {
String a
}

def t = new Test(a: "hello")
t.b = 42
println b

This results in an exception about a missing b, because its idea of a class is exactly the same as the underlying JVM platform, where objects are much more static by default (this is also true of .NET). Note we can even declare types if we like (or we can just use def). I really like how default constructors work – it’s even less typing than both Python and Ruby! It has useful closures, regular expression and hash table literals. Pretty cool. For a more complex example, here’s an example of a fairly typical scripting task of log file processing I wrote a few days ago. I’m fairly sold so far, but there is still more to learn. I spent a bit of spare time poking at getting it packaged, but ran into some Maven bootstrapping issues.

More on languages

One random link: An awesome feature of Python is Generators, and if you aren’t familiar with them and think of yourself as a “systems programmer”, check out this very good slide set.

Second to last: some recent additions to my Google Reader feed: Charles Oliver Nutter, John Rose, Lambda the Ultimate.

As an aside that’s not directly language related, but also new to my feed list is Why, who is like a great artist-programmer churning out amazing works like Shoes. Does anyone else have the feeling that for Why all of these code projects are just what he does in his idle time, and in the next few years he’ll emerge from his underground hideout with an army of giant robots and take over the earth?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s