My XML Wish – Stuff Yaron Finds Interesting

If I were XML king for a day and could make any change I wanted to XML it would be to add length encoding. Length encoding would provide an order of magnitude better performance in handling XML messages, make XML proxies practical and finally rid us of MIME.

It turns out that one of the most expensive aspects of processing a message isn't sending it over the wire. Network transmission is handled by dedicated hardware that takes care of moving the bytes in and out of the network. Even moving a message through the CPU need not be terribly expensive. Most modern CPUs have instructions designed to bulk move sections of data through the CPU in a very small number of clock/bus cycles. The real expense comes when one has to actually 'touch' the message.

Unfortunately XML is as bad as it's possible for a text based protocol to be in this regard. When a XML message comes in every byte of the message has to be individually examined by the CPU. Because XML is not a 'forward looking' data format the XML processor has no idea what's in the XML until it actually touches each byte in the XML. Think of a piece of xml like <foo>Hi Mom</foo>. Let us say that the processor doesn't actually care about the foo element. There is no way for it to skip over foo because it has no idea how long the value inside foo is. So the processor has to waste cycles touching every byte in the string "Hi Mom" just so it can find the last byte and then throw the whole string out.

A close variant of this problem is moving binary data. Certain kinds of data are going to remain non-text. Video, audio, compressed and encrypted information are high on that list. To move a piece of binary data inside of an XML document you have to base-64 encode it. Which means that if someone sends a picture that the processor just wants to record in a file it can't just blindly copy the bytes into a file using bulk memory transfers. It actually has to touch each and every byte, decode it and then save it.

There is a simple way around this problem that I and others suggested while XML was being developed[1], use length encoded strings. Imagine if one could say in XML <foo jump="6">Hi Mom</foo>. This would tell the XML parser that foo's content ended 6 bytes after the closing angle bracket. Now imagine <foo binary="true" jump="6">Hi Mom</foo>. This would tell the processor that the data ended six bytes later and, btw, the data is binary, not XML, so don't try to parse it as XML. The jump attribute would be completely optional.

Imagine a user has asked for a file and the processor needs to encrypt the file before transmitting it. To use the jump attribute it has to know the length of the encrypted output before it can actually transmit the output, otherwise how will the processor correctly set the jump value? This is a well known problem in protocol circles. The first solution to the problem was to require the processor to queue up the entire output before sending it so that the full length would be known. But this turns out to be a bad idea because all the memory that's being used to queue up the content can't be used to handle other messages.

A better way to use resources is to support a format like <foo binary="true" jump="6" continue="true">Hi Mom</foo><continue jump="7" continue="true"> Hi Dad</continue><continue continue="false"/>. This is what's known as a chunked encoding. Instead of forcing the server to queue up the entire value just so it could find out the true length we instead allow the processor to output the part of the output it already has (in this case the first 6 bytes) and then say "BTW, there is more". The processor can then continue to send out additional chunks until it gets to the end. So from a conceptual point of view the previous piece of XML is exactly equivalent to <foo>Hi Mom Hi Dad</foo>.

One could even imagine a generalization of this feature to allow for named targets. For example "<foo target="bar" jump="19">Hi Mom</foo><blah/><bar/>. Using chunked encoding with jump would make for some interesting scenarios, especially if nesting is allowed.

The goal in all of this is to make it cheap to process XML. With length encoding it would become possible to have high performance XML proxies that could quickly jump to the part of the XML relevant to them. Imaging having a set of named targets at the top of a SOAP message pointing to the SOAP headers in the message. Or imagine sending a large binary file without having to base-64 encode it. Besides, length encoding could kill MIME once and for all.

The usual counter argument is that chunked encoding should be handled one layer down (as it is in HTML). But that argument doesn't apply here because the chunked encoding must have specific knowledge of the XML content in order to provide for targeted jumps or to ensure that binary data is treated as such by the XML processor. Breaking the functional separation barrier by building this knowledge into the transport is generally bad design and will just flat out break if a piece of XML has to move over multiple different transports, as is already de rigueur today.

[1] I distinctly remember discussing this issue with Roy Fielding at Jim Whitehead's wedding (hey, the wedding was an awesome 3 day affair up on gorgeous mount Tamalpais and this discussion was late on the first day). He wanted to put in jump pointers for all the children of the root element.

Leave a Reply Cancel reply