Adding Extensibility to JSON Data Formats

How does one version a JSON message? How does one make it backwards or forward compatible? This may not be an issue when one's own Javascript is talking to one's own server via JSON but once you throw in creative use of script tags with src attributes or environments other than the browser these issues become relevant.

The Problem

How does one process JSON messages so that they will support both backwards and forwards compatibility? That is, how does one add new content into an existing JSON message format such that those who do not understand the extended content will be able to safely ignore it?

The Proposed Solution

In the absence of additional information providing guidance on how to handle unrecognized members a JSON processor compliant with this proposal MUST ignore any members whose names are not recognized by the processor.

For example, if a processor was expecting to receive an object that contained a single member with the name "movieTitle" and instead it receives an object with multiple members including "movieTitle", "producer" and "director" then the JSON processor would, by default, act as if the "producer" and "director" members were not present.

An exception to this situation would be a member named "movie" whose value is an object where the semantics of the members of that object is "the local name of the members of this object are suitable for presenting as titles and their values as text under those titles". In that case regardless of the processor's direct knowledge of the semantics of the members of the object (e.g. the processor may actually know about movieTitle but not "producer" or "directory") the processor can still process the unrecognized members because it has additional information about how to process them.

This requirement does not apply to incorrect usage of recognized names. For example, if the definition of an object only allowed a single "movieTitle" member then having two "movieTitle" members is simply an error and the ignore rule does not apply.

This specification does not require that ignored members be removed from the JSON structure. It is quite possible that other processors who will deal with the message may recognized members the current processor does not. Therefore it would make sense to let unrecognized members remain in the JSON structure so that others who process the structure may benefit from the extended information.

Definition: Simple value – A value of a type other than array or object.

If a JSON processor encounters an array where it had expected to encounter a simple value the processor MUST retrieve the first simple value in the array and treat that as the value it was expecting and ignore the other elements in the array.

For example, if the processor was expecting a member with a number value, e.g.

{ "orderid":1234 } 

and instead sees

{ "orderid": [ { "shipped" : true }, 798, {"revocable" : false}]}

Then the processor would treat the value of orderid as being 798 and ignore the other elements in the array.

When extending a simple value using the array mechanism the array MUST only contain a single simple value. And that simple value MUST be of the expected type. All other members of the array MUST be objects. If any of these rules are violated then the JSON structure MUST be treated as violating its definition.

Q&A

Wait, does ignore mean delete or, well, ignore?

The proposal is intentionally fuzzy on this particular point because the best behavior depends on context. For example, if one is writing Javascript that will accept a JSON structure, manipulate it locally and never share anything with anyone else then just deleting unrecognized members is probably best. It will certainly simplify programming.

But if one is writing an application, even a browser based one, that will be sharing data taken from the JSON structure with others then it would be better to leave unrecognized members in place and simply 'pretend' they are not there for purposes such as iteration.

Because the best choice is based on the situation the proposal leaves it up to the processor to decide on the best course of action.

What the heck is up with the simple value/array bizarreness?

It's a rule of extensibility that one never knows what will reasonably need to be extended. It is just as likely that a simple value might need to be extended as a non-simple value (i.e. an object or array). In XML extending simple types is pretty straight forward, just stick in elements. This is a vestige of XML's markup ancestry. But JSON has no similar mechanism so I had to invent one.

It was tempting to just ignore this issue but in practice there are often good reasons to extend simple types. Without a mechanism such as the one I listed above then the only way to extend a simple type would be to create a sibling to it under the parent object/array and then come up with some way to specify that the extension is related to the original simple value. Having to use cross linking in order to enable basic extensibility is both overly complex and unreadable. So I felt the array solution was the least awful way out.

Why an array and not an object?

I specify that extension of simple values is by array because this kept me from having to invent a bogus 'name' just to record the simple type. For example, if I had used an object for extending simple values then the previous example would have to look something like:

{ "orderid": { { "shipped" : true }, {"originalvalue":798}, {"revocable" : false}]}

where "originalvalue" would be a magic reserved name that would identify the original value. This seemed a bit too heavyweight to me. Besides, I generally like to avoid magical names.

Why only allow a single simple value in the extension array?

Because I didn't want to turn JSON into a pseudo markup language. E.g. in a markup language it is common to break simple types into pieces and then wrap them in elements in order to mark them up. JSON could do something similar. E.g.:

{ "name": "Mark Nottingham"}

could be extended into

{ "name" : ["Mark",{"firstname":"true"},"Nottingham",{"lastname":"true"}]}

Even the ordering offered by the array would be useful. For example:

{ "name" : ["Hong", {"lastname":"true"},"Gidong",{"firstname":"true"}]}

But the previous isn't a real markup language. It couldn't, for example, wrap multiple simple values the way a real markup language could. Of course we could always fix that, e.g.:

{ "name" : [{"lastname": "Hong"}, {"firstname": "Gidong"}]}

We could then specify rules for extending simple values that say things like "do a depth first transversal of all values and concatenate the results". But this brings up more than a few nasty problems such as how to tell the difference between extension objects that contain text that isn't relevant to the original simple value to how to combine simple values of different types (E.g. what should happen with true + 123 + "hi"?).

All of these problems are solvable but I think this makes the situation way too complex. So I opted to just allow a single simple value and avoid the whole mess. Yes, this will restrict certain kinds of extensibility but I felt that turning JSON into a markup language was a step too far.

Why must all the elements in the extension array but the simple value be objects?

The explicit purpose of the array extension mechanism is to support, well, extensions. Good extension must be self describing. In JSON the only truly self describing entity is the object. So I require that everything but the core simple value in the extension array to be objects.

2 thoughts on “Adding Extensibility to JSON Data Formats”

  1. I think “processor MUST ignore” is as far as you can go. I know you’re making this simple, but even still, it seems like it assumes to much of the processor. If you want people to use your service, you should only expect them to build what they need today.

    How bad would it be really to ask that service provider use new names for new things and leave the old things alone? I think I’d rather deal with the clutter, especially in the case of simple type. I think most of your examples read better if you just add new fields – you may wonder why there’s redundant data, but you won’t have trouble understanding the message.

    I know it’s not always possible to know what you will need, but I’d rather put the burden on protocol designers to plan for extensibility. If you think you way want to extend something, make it an object, not a simple value.

  2. The more I think about this the more I think you’re right. Simpler is usually better and extensibility is a pain even in the best case. Besides I don’t look forward to explaining the whole array mess. I’d like to get a little more feedback but my guess is that I’ll probably just remove the simple value extensibility all together.

Leave a Reply

Your email address will not be published. Required fields are marked *