So you have some XML…

Recently I battled the Typica library for accessing Amazon Web Services. There was an obscure conflict between its JAXB dependency and our app server. JAXB is…well, at its root it compiles XML schemas using a mapping to Java classes. This is in theory cool – you can get completion in your IDE for web services. In practice though, it just feels to me like it’s not worth the complexity.

So, I needed to quickly roll my own Java bindings for AWS. Basically we need to do a HTTP POST request, and then parse a small returned XML document. For the first part, Apache Commons HTTP is quite good. But then I was in a situation in which I’m sure many developers find themselves – you need to get the data out of a small XML document. No need for schema validation, etc.

There are a lot of XML libraries out there. We could use the DOM API, which is familiar because it’s fairly standardized across languages like Python, Java, and JavaScript, despite how nonnative it feels in all of them. If you’re really a masochist, you’ll try to write a state machine with SAX. Then there are the bindings which target a specific library like JDOM for Java, ElementTree for Python, etc. These can be nice, but they’re also not portable if you happen to move between different languages (and many developers today probably move between JavaScript and a sane language on the server side). So, there is another option which gives you a decent API that’s also largely portable: XPath with DOM.

Here’s our sample document:

<ReceiveMessageResponse>
<ReceiveMessageResult>
<Message>
<MessageId>11YEJMCHE2DM483NGN40|3H4AA8J7EJKM0DQZR7E1|PT6DRTB278S4MNY77NJ0</MessageId>
<ReceiptHandle>Z2hlcm1hbi5kZXNrdG9wLmFtYXpvbi5jb20=:AAABFoNJa/AAAAAAAAAANwAAAAAAAAAAAAAAAAAAAAQAAAEXAMPLE</ReceiptHandle>
<MD5OfBody>acbd18db4cc2f85cedef654fccc4a4d8</MD5OfBody>
<Body>foo</Body>
</Message>
<Message>
<MessageId>0MKX1FF3JB8VWS8JAV79|3H4AA8J7EJKM0DQZR7E1|PT6DRTB278S4MNY77NJ0</MessageId>
<ReceiptHandle>X5djmi3uoi2zZ8Vdr5TkmAQtDTwrcd9lx87=:AAABFoNJa/AAAAAAAAAANwAAAAAAAAAAAAAAAAAAAAQAAAEXAMPLE</ReceiptHandle>
<MD5OfBody>37b51d194a7513e45b56f6524f2d51f2</MD5OfBody>
<Body>bar</Body>
</Message>
</ReceiveMessageResult>
<ResponseMetadata>
<RequestId>b5bf2332-e983-4d3e-941a-f64c0d21f00f</RequestId>
</ResponseMetadata>
</ReceiveMessageResponse>

And here’s the code:

XmlParseData parsed = XmlUtils.parseXml(post.getResponseBodyAsStream(), 
new String[] { "q", "http://queue.amazonaws.com/doc/2008-01-01/" });			
NodeList msgNodes = (NodeList) parsed.xpath.evaluate("/q:ReceiveMessageResponse/q:ReceiveMessageResult/q:Message", 
parsed.doc, XPathConstants.NODESET);
Message[] msgResult = new Message[msgNodes.getLength()];
for (int i = 0; i < msgNodes.getLength(); i++) {
Node node = msgNodes.item(i);
msgResult[i] = new Message(parsed.xpath.evaluate("q:Body", node),
parsed.xpath.evaluate("q:ReceiptHandle", node));
}

I think this is pretty clear; the straight DOM or SAX alternatives would have been notably more lines of code. The key part being the XPath expression /q:ReceiveMessageResponse/q:ReceiveMessageResult/q:Message which gives us all the Message nodes.

For more about parsing XML in Java (though largely applicable to other systems that have XPath bindings), see this DeveloperWorks article may be useful. Also, this is the wrapper class I used in the code sample. And yes, I am using new String[] to work around Java’s lack of hash literals.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s