Adding Namespaces to JSON
Wednesday August 02nd 2006, 12:00 am
Filed under: SOA/Web/Etc.

As part of my job I am thinking deep thoughts about what protocols Windows Live should expose and right now I'm pushing hard for JSON to be a premier protocol data format. I like JSON because it makes it extremely easy to persist hierarchical data structures which account for the bulk of the messages that Windows Live needs to move around. But JSON does have a number of issues that I think need to be addressed, specifically: namespaces, extensibility, schema/type support and relative linking. In this article I make a proposal for how to address namespaces. I will address the other issues in future articles.


The Problem

If two groups both create a name "firstName" and each gives it a different syntax and semantics how is someone handed a JSON document supposed to know which group's syntax/semantics to apply? In some cases there might be enough context (e.g. the data was retrieved from one of the group's servers) to disambiguate the situation but it is increasingly common for distributed services to be created where the original source of some piece of information can trivially be lost somewhere down the processing chain. It therefore would be extremely useful for JSON documents to be 'self describing' in the sense that one can look at any name in a JSON document in isolation and have some reasonable hope of determining if that particular name represents the syntax and semantics one is expecting.

The Proposed Solution

It is proposed that JSON names be defined as having two parts, a namespace name and a local name. The two are combined as namespace name + "." + local name to form a fully qualified JSON name. Namespace names MAY contain the "." character. Local names MUST NOT contain the "." character. Namespace names MUST consist of the reverse listing of subdomains in a fully qualified DNS name. E.g. org.goland or com.example.bigfatorg.definition.

To enable space savings and to increase both the readability and write-ability of JSON a JSON name MAY omit its namespace name along with the "." character that concatenated it to its local name. In this case the namespace of the name is logically set to the namespace of the name's parent object. E.g.

{ "org.goland.schemas.projectFoo.specProposal" :
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"com.example.schemas.middleName":"Y",
"org.goland.schemas.projectFoo.lastName": "Goland",
}
}

In the previous example the name firstName, because it lacks a namespace takes on its parent object's namespace. That parent is author which also lacks a namespace so recursively author looks to its parent specProposal which does have a namespace, org.goland.schemas.projectFoo. middleName introduces a new namespace "com.example.schemas", if the value was an object then the names in that object would inherit the com.example.schemas namespace. Because the use of the compression mechanism is optional the lastName value can be fully qualified even though it shares the same namespace as its parent. com.example.taxonomy

If the name of the root object in a JSON structure is not fully qualified then the names contained in that JSON structure MUST NOT be treated as being compliant with this specification. Note however that the presence of a fully qualified name is not sufficient to determine that a JSON structure is compliant with this proposal as it is legal to have names that include the "." character in JSON. To be sure a JSON structure is compliant one needs out of band information.

Q&A

Isn't this proposal incompatible with existing JSON systems?

Since this proposal doesn't change JSON's syntax any JSON object generated in compliance with this proposal will be processable by any existing JSON processor. Even the introduction of namespaces is not, in itself, a big deal as JSON currently says nothing about the semantics of names so sprinkling in "."s doesn't change things. What does change things however is the compression mechanism. An existing JSON processor would reasonably see "org.goland.firstName" and "firstName" as being unrelated names. But with this proposal their relationship would be defined by their relative positions in the object structure. This isn't something an existing JSON processor would know how to address. The practical ramification of this is that when the JSON processor translates the JSON structure into a programming language it won't output fully qualified names and so could cause a real mess.

Given the compatibility issue is it appropriate for systems compliant with this proposal to use the application/json MIME type?

The application/json MIME type is defined in RFC 4627. MIME types have traditionally focused on syntax, not semantics, so it's reasonable to argue that application/json is appropriate for use with this proposal since the proposal changed the JSON syntax. Although at a minimum it would seem reasonable to extend RFC 4627 to include an optional parameter indicating compliance with this extension. In any case I'm open to ideas.

Why not allow for relative namespace names?

{ "com.example.something":
".foobar.somethingelse": "Isn't this neat?"
}

One could argue that ".foobar.somethingelse" should, because it starts with a ".", be treated as a relative namespace and therefore its full namespace would be "com.example.foobar". This seemed to me to be too clever by half and so I decided not to do it.

Why not use namespace prefixes ala XML?

To use namespace prefixes we would have to add in an object whose only purpose was to define the prefixes. Then we would have to create a bogus root to contain that object. E.g.

{ bogusroot:
"json.namespacePrefixDefinitions":
{ "G":"http:\\/\\/goland.org\\/schemas\\/projectfoo",
"E":"http:\\/\\/example.com\\/schemas },
"realroot":
{ "G.specProposal" :
{ "title": "JSON Extensions",
"author": { "firstName": "Yaron",
"E.middleName":"Y",
"G.lastName": "Goland"
}
}
}
}

I used the "." character instead of the ":" character to separate the prefix from its local name because I think that's more readable in JSON. But in any case note the nastiness involved in using prefixes. The reason this doesn't seem as nasty in XML is because XML has those horrific violators of data consistency – attributes. Since no such animals (thankfully) exist in JSON we have to do violence to the object model in order to enable prefixes. Hence my rejection of prefixes.

Why use DNS when you could have used URLS?

I suppose the weasel answer is that "/" is illegal unescaped in a JSON string so you would end up with:

{ "http:\\/\\/goland.org\\/schemas\\/projectfoo\\/specProposal" :
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"http:\\/\\/example.com\\/shemas\\/middleName":"Y",
"http:\\/\\/goland.org\\/schemas\\/projectfoo\\/lastName": "Goland"
}
}

Which I personally think is just plain ugly. But I see nothing sacred in the JSON format and have given serious thought to proposing an alternative because I always get confused where to put curly brackets versus commas and I'm concerned that there is no way to annotate values in a string. In such an alternative I could make it legal to use "/" characters in names.

But in this case the real reason I avoided URLs is simplicity. I've never bought into the idea that a namespace name should necessarily be resolvable and I find reverse DNS names to be easier to deal with. Just personal taste I suppose. Apparently those years in Java land rubbed off on me.

Do we really need the compression mechanism?

I would dearly love to get rid of the compression mechanism because it complicates the processing model, makes it harder to pull objects out of a JSON structure, etc. But I'm deeply concerned that requiring each and every name to be fully qualified will make JSON both unreadable and unwriteable. E.g.:

{ "org.goland.schemas.projectfoo.specProposal" :
"org.goland.schemas.projectfoo.title": "JSON Extensions",
"org.goland.schemas.projectfoo.author": { "org.goland.schemas.projectfoo.firstName": "Yaron",
"com.example.schemas.middleName": "Y",
"org.goland.schemas.projectfoo.lastName": "Goland"
}
}

Um…. yuck. And that's without even discussing the byte bloat.

How can I define a name that has no namespace?

The proposal doesn't allow for that. I think that allowing for a mix of non-namespace qualified names and namespace qualified names just adds a lot of complexity for zero benefit so I require that all names be namespace qualified in order to be compliant with this proposal.



9 Comments so far

Hi Yaron,

I guess the question here is whether it’s more important that the naming system is compatible with Java or with XML (and XML applications that rely on XML namespaces, such as Atom or WebDAV). It seems to me that disallowing XML namespace URIs as JSON namespace names may make it hard to use when a service talks both JSON and XML.

For the question about resolvability of names, have a look at Norman’s latest blog entry (at http://norman.walsh.name/2006/07/25/namesAndAddresses).

Best regards, Julian

Comment by Julian Reschke 08.03.06 @ 1:13 am

Actually I hadn’t intended to be backwards compatible with either one. I explicitly view it as a non-goal to worry about how to re-use XML or Java’s namespaces.

The infoset != JSON’s data model so any attempt to map the two will either require crippling XML or making JSON as complex as XML. Both choices are unacceptable so I just don’t worry about it.

As for Norm’s article, I’m going to stay away from this issue because I already have more things to deal with then I want to. :)

Comment by Administrator 08.03.06 @ 12:38 pm

Two comments: Since we’re throwing the towel in on XML to a degree, can we also through the towel in on schema and declaring strong types for message formats? It’s always seemed a bit odd to have a schema for something that was also supposed to be self describing.

second, do you really need to intermingle the data from different namespaces? It’s not like this is code. Could you do something like :


{ "localfield1": "foo",
"localfield2": "bar",
"imports": { "namespace1": {"myimportedfield1":"baz"},
"namespace2": {"myotherimportedfield":"bab"}
}
}

Admittedly, I didn’t give this a lot of thought. You’d probably want a more qualified name than “imports”, but seems like this syntax is fairly compact and you get the bonus of pretty simple processing model.

Just a thought.

Comment by Pete Dapkus 08.09.06 @ 7:31 pm

I’m not sure I know what you mean by ’strong types’. But the article I’ll be releasing in the near future on a schema for the infoset I published today will probably give you a chance to clarify.

As for ‘imports’, this format would make it difficult to directly include fields from other namespaces. They would also be ‘children’ and forced off to a corner (I suppose that makes them bad children?).

But it’s a fair point, I wonder how many schemas meaningfully include multiple namespaces?

Comment by Administrator 08.09.06 @ 10:23 pm

don’t think of it as pushed off into the corner; think of it as neatly packaged and prominently labeled.

I guess I’m wondering why you want from a schema? What does it give you that “self-describing” doesn’t give you already? Are you trying to specify a contract? Are you trying to get your messages into a strongly typed language?

If you’re going to have a schema, why not skip the overhead of self-describing?

Comment by Anonymous 08.11.06 @ 4:11 pm

An outstanding question! One I wish I heard more (although I was pretty happy when my dev lead made the same point). Why bother with self describing if you are going to have a schema?

There are a couple of answer. My personal favorite is – ease of programming and debugging.

If you had asked me 11 or 12 years ago if self describing data was important I’m not sure I’d say yes. I’d probably say ‘naah, perf matters more, encode it, compress it, whatever, so long as your debugging tools can decompress it, who cares?’

Then HTTP happened. I have to admit that I wouldn’t have believed it if I hadn’t lived through it. But it turns out that having protocols and data formats that are trivial to produce and trivial to write code to consume and to debug on the fly (ahh the power of Telnet, and yes, I, like others, have debugged HTTP via Telnet, I’m probably just old though) makes it remarkably easy for people to adopt your protocol/data format.

By way of comparison lets look at something like EDI or even LDAP. Both use heavily schema’d binary protocols. The good news is that they are very efficient. The bad news is that you can’t use printf (or write or whatever your favorite text output is) to produce them, forget scanf for consuming them and lord help you if you ever need to debug them. Yes, you can get special tools that will decode them on the fly, if you have the schema. If you don’t have the schema (e.g. you are debugging a network and not sure what’s flying around) then you’re completely screwed, all you have is a collection of position based fields. Good luck.

Not being able to easily write, consume or debug a protocol or data format turns out to put a huge barrier to entry between the programmer and the format. Which is why I believe that HTTP and even XML took off while LDAP and EDI live in very special closed off communities. Of course given how much money goes over EDI maybe I should pray for failure but that’s a separate issue.

The other reason that self describing formats really, really matter is extensibility. Extensibility covers the stuff that isn’t in the schema. In theory you can fix this by designated extension zones. LDAP does something like this. They hacked their binary format to allow people to add in arbitrary properties. But only in certain places. Of course if the best place for your extension isn’t the place people thought would be best then you’re screwed. In theory this doesn’t matter since you can use links to hook things together but if ‘you can just link it together’ really worked we would all be using RDF. It turns out that location matters for understanding so you really want to be able to put your extensions next to the thing being extended. Not in some random location that someone decided extensions might make sense in.

And, here’s the best part, LDAP’s extension properties are self describing name/value pairs. Why self describing? Because it turns out that multiple different groups want to simultaneously expand the same entry. The schema is useless here because the groups aren’t coordinating so there is no central schema so without a self describing format (and some kind of name collision avoidance mechanism, hence namespaces) everyone ends up with an unreadable mess (bytes 15-30 are mine damnit! No, they are mine!). So if we ever want to be able to expand messages after they are released (and if the HTTP taught us anything it’s don’t paint yourself into a corner) then we need self describing formats, even when we have schemas.

In fact my own interest in schemas is primarily for marshaling and extensibility. Specifically, I want to use a schema so a programmer can say “This is the message I’m expecting and here’s how I want to rip it into memory” and I also want the programmer to be able to use the schema as a filter to rip out any extensions they don’t support so their code doesn’t blow up if they are sent a message with an extension that didn’t exist when the code was written.

So, to summarize (this really should be a blog article) I want a self describing format even when I have a schema because it makes it easier to write data, read data, debug data and extend data.

Comment by Administrator 08.11.06 @ 10:15 pm

Yes, I was trying to draw you out a bit. No question that self-describing is a big win. I guess I’m more inclined to go the “no schema” route, just because it seems like expressing a message format that is just extensible enough and no more is a bit of a fool’s errand. Neither the producer or the consumer can really do it alone.

In a previous life, I did a fair amount of work with ASN.1 (including working on the spec for an ASN.1 based standard). In my original comment, I was proposing designated extensions zones for JSON.

if it didn’t violate my requirement of assuming nothing of the receiver, I’d also propose mustUnderstand for imported fields.

(I suppose you could kludge one by make a wrapper object in another namespace, adding your field, and making the original message a field on the wrapper.)

Comment by Pete Dapkus 08.16.06 @ 4:57 pm

The schema is a helper object, it should never be required, but it can be useful.

If you are using designated extension zones then why have a structured data format at all? Why not just have flat name/value pairs and use links for structure (ala RDF)? If structure matters then placing your extensions in the right place in the structure matters. Extension zones, by definition, allow extensions only in pre-approved places. So if the right place for your extension is somewhere other than the blessed extension zone then your SOL. I think you either have to pick a RDF style approach or you have to allow arbitrary extensibility, but you can’t have it both ways.

I actually am a big fan of the wrapper object approach. If you are changing something in a manner that isn’t backwards compatible (and therefore needs a must understand to point this out) then you should change the parent element because you are breaking the contract.

Comment by Administrator 08.16.06 @ 6:43 pm

I came across your page searching for JSON schema conventions, not namespaces, but anyway just a note:

You wrote, ‘”/” is illegal unescaped in a JSON string’. I’m looking at RFC 4627 which lists this ABNF rule:

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

‘/’ is %x2F, which falls in the second legal range for unescaped string characters. Similarly, http://www.json.org lists legal unescaped characters as:

any-Unicode-except-”-or-\-or-control

‘/’is neither ‘”‘ nor ‘\’ nor a control character, so it is also legal by this definition.

Comment by Anthony Carrico 10.27.06 @ 7:31 am



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>