The Importance of Humility in Extensibility Design

When designing a protocol or programming language the inclusion of extensibility is essentially an act of humility. One is minimally admitting that one's design is not complete and more generally one is admitting that one's design is not perfect. By providing for extensibility one is enabling others to improve and in many cases fix one's design.

Unfortunately it's easy to get extensibility design wrong. Typically such design errors result from assuming oneself or others to be perfect and in that assumption one fails to provide for sufficient extensibility. A rather subtle example of this problem recently came up in the Web Services – Business Process Execution Language Technical Committee (WS-BPEL TC).

The BPEL TC is trying to agree on how to enable extensions to BPEL and its execution environment. The working assumption of the TC is that extensions will have IDs and those IDs will be URIs and those URIs will be included in each BPEL file that makes use of the identified extension. Each URI can then be tagged with a mustUnderstand attribute that indicates if the BPEL engine must support that extension or not. When processing a BPEL program file a BPEL engine will check the list of extensions and if it doesn't recognize any extension marked "must understand" then the engine will refuse to run the file.

Where things get interesting is a proposal to overload the extension ID, which is a URI, to also have it be the XML namespace for any XML attributes or elements associated with that extension. The attraction of doing this is that a BPEL engine can collect together all the XML attributes/elements in a BPEL program instance and determine if those elements/attributes come from the BPEL namespace or one of the extension namespaces. If the answer is neither then the element/attribute must be illegal.

The benefit of this trick is that it can automatically catch certain typos. If someone misspells a namespace name then this trick would automatically catch it. Of course the trick is useless if someone gets the namespace right but misspells an element or attribute local name as the only information automatically available to the BPEL engine about an extension as a result of this trick is just its namespace. This trick also can't catch when elements or attributes are used with the wrong syntax or in the wrong locations, again, that information is not encoded in the extension declaration. In fact the value of this trick is so minimal one wonders why it's worth having at all? Still, I've seen variants of this design pattern attempted so many times that I think it's worth cataloging what the problems are.

In theory there is nothing wrong with the trick, so long as all extensions are perfectly designed for every possible use. In other words this design assumes extension writers are infallible.

For example, let's say that Company A comes up with two extension XML elements to BPEL, <EXT1> and <EXT2>. It puts these two elements into the http://companya.com/ext namespace which, according to the previous design, must also be the extension ID for any BPEL process that uses these two elements.

Now along comes Company B. Company B has invented its own new elements but it turns out that company A's <EXT1> extension element would perfectly complete Company B's extension set. However Company A's other extension element, <EXT2>, doesn't fit in at all so Company B doesn't want to have to support it. Naively enough Company B creates a new extension called http://Bcompany.com/bpel which includes all the new Company B extensions as well as implicitly referencing the <EXT1> extension element from Company A's namespace.

Of course as soon as a programmer creates a BPEL program that uses <a:EXT1 xmlns:a="http://companya.com/ext"/> but only declares the extension http://Bcompany.com/bpel, the BPEL engine will throw an error. It will say "your program is in error because you use an element from the namespace http://companya.com/ext but you don't declare that as an extension."

Company B can work around this problem by telling everyone who uses Company B's extension to make sure to include an extension declaration for http://companya.com/ext but to mark it as "Must Understand = NO". Now the automatic validator won't fail on the element from Company A's namespace but by making the extension declaration optional engines won't actually have to support all of Company A's namespace, only the <EXT1> element that the Company B extension implicitly requires.

An immediate complication of this trick is that it reduces the utility of the typo prevention system. Notice what happens if a programmer using Company B's extension accidentally types in <EXT2> instead of <EXT1>. Since <EXT2> is from the same namespace as <EXT1> the automatic validator will think it's legitimate. In a funny way the more re-use extensions make of other people's work the less valuable the typo prevention mechanism is because the more extraneous namespaces will be declared and so the larger the room for error.

Also note how complicated the automatic validation logic will have to be. A really simple validator would go through the optional extensions, identify any that aren't supported and delete any elements or attributes in the BPEL file from the unsupported extension's namespace. But that won't be possible because extensions like Company B's may have implicit requirements for specific elements from other namespaces. So in this case there would have to be some kind of exception list that an engine that supported Company B's extension but didn't support Company A's exception would have to have that told it that it was o.k. to delete all elements and attributes from Company A's namespace except for <EXT1>.

But wait, the fun's not over yet! Throwing in an optional reference to http://companya.com/ext can cause additional complications because an extension doesn't just define new attributes and elements, it can also define new behavioral requirements for the BPEL engine. For example, Company A's extension might include the implicit requirement that any BPEL engine that supports the extension must use weak cryptography for web service communications in order to make Company A's extension exportable. So when someone just wanting to use Company B's extension, which neither needs nor wants the weak cryptography limitation, is forced to include http://companya.com/ext as an optional extension (to subvert the 'typo' protection) and if the BPEL engine that the program is running on actually supports Company A's extension then they will find their program inadvertently only using weak cryptography. This is a great example of the law of unintended consequences.

To dig out of this mess Company B will have to explicitly define as part of its mandatory semantics that if Company A's extension is used but is marked as optional then the weak cryptography requirement required by Company A's extension must be ignored.

To review, the problems with overloading the extension ID to also be the namespace name are:

Makes Re-Use Painful – One can't just declare a single extension and be done. Rather, one has to declare the extension and then declare extensions for each and every namespace that the first extension draws elements or attributes from. In addition the extension designer has to review the implied semantics of all the extensions that it re-uses elements or attributes from and determine which of their implicit engine semantics to de-activate.
Doesn't Actually Catch Many Typos – Because the typo system only works at the namespace level it already can't catch errors in typing element or attribute local names and because of the requirement to throw in namespace declarations for each and every namespace one borrows so much as one attribute or element from the typo validator will become even less useful.

The core of all of these problems is the underlying assumption that people who design extensions are infallible. That extension designers will exactly nail the semantics, attributes and elements that everyone else will want to re-use so there will never be a need to borrow just part of the attributes or elements in an extension or to override any of the extension's implicit engine semantics.. A humbler approach admits to both the possibility and indeed probability of mistakes in the granularity of extensions and so doesn't overload extension IDs with namespace functionality.

Leave a Reply Cancel reply