Adding Namespaces to JSON

As part of my job I am thinking deep thoughts about what protocols Windows Live should expose and right now I'm pushing hard for JSON to be a premier protocol data format. I like JSON because it makes it extremely easy to persist hierarchical data structures which account for the bulk of the messages that Windows Live needs to move around. But JSON does have a number of issues that I think need to be addressed, specifically: namespaces, extensibility, schema/type support and relative linking. In this article I make a proposal for how to address namespaces. I will address the other issues in future articles.

The Problem

If two groups both create a name "firstName" and each gives it a different syntax and semantics how is someone handed a JSON document supposed to know which group's syntax/semantics to apply? In some cases there might be enough context (e.g. the data was retrieved from one of the group's servers) to disambiguate the situation but it is increasingly common for distributed services to be created where the original source of some piece of information can trivially be lost somewhere down the processing chain. It therefore would be extremely useful for JSON documents to be 'self describing' in the sense that one can look at any name in a JSON document in isolation and have some reasonable hope of determining if that particular name represents the syntax and semantics one is expecting.

The Proposed Solution

It is proposed that JSON names be defined as having two parts, a namespace name and a local name. The two are combined as namespace name + "." + local name to form a fully qualified JSON name. Namespace names MAY contain the "." character. Local names MUST NOT contain the "." character. Namespace names MUST consist of the reverse listing of subdomains in a fully qualified DNS name. E.g. org.goland or com.example.bigfatorg.definition.

To enable space savings and to increase both the readability and write-ability of JSON a JSON name MAY omit its namespace name along with the "." character that concatenated it to its local name. In this case the namespace of the name is logically set to the namespace of the name's parent object. E.g.

{ "org.goland.schemas.projectFoo.specProposal" :
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"com.example.schemas.middleName":"Y",
"org.goland.schemas.projectFoo.lastName": "Goland",
}
}

In the previous example the name firstName, because it lacks a namespace takes on its parent object's namespace. That parent is author which also lacks a namespace so recursively author looks to its parent specProposal which does have a namespace, org.goland.schemas.projectFoo. middleName introduces a new namespace "com.example.schemas", if the value was an object then the names in that object would inherit the com.example.schemas namespace. Because the use of the compression mechanism is optional the lastName value can be fully qualified even though it shares the same namespace as its parent. com.example.taxonomy

If the name of the root object in a JSON structure is not fully qualified then the names contained in that JSON structure MUST NOT be treated as being compliant with this specification. Note however that the presence of a fully qualified name is not sufficient to determine that a JSON structure is compliant with this proposal as it is legal to have names that include the "." character in JSON. To be sure a JSON structure is compliant one needs out of band information.

Q&A

Isn't this proposal incompatible with existing JSON systems?

Since this proposal doesn't change JSON's syntax any JSON object generated in compliance with this proposal will be processable by any existing JSON processor. Even the introduction of namespaces is not, in itself, a big deal as JSON currently says nothing about the semantics of names so sprinkling in "."s doesn't change things. What does change things however is the compression mechanism. An existing JSON processor would reasonably see "org.goland.firstName" and "firstName" as being unrelated names. But with this proposal their relationship would be defined by their relative positions in the object structure. This isn't something an existing JSON processor would know how to address. The practical ramification of this is that when the JSON processor translates the JSON structure into a programming language it won't output fully qualified names and so could cause a real mess.

Given the compatibility issue is it appropriate for systems compliant with this proposal to use the application/json MIME type?

The application/json MIME type is defined in RFC 4627. MIME types have traditionally focused on syntax, not semantics, so it's reasonable to argue that application/json is appropriate for use with this proposal since the proposal changed the JSON syntax. Although at a minimum it would seem reasonable to extend RFC 4627 to include an optional parameter indicating compliance with this extension. In any case I'm open to ideas.

Why not allow for relative namespace names?

{ "com.example.something":
".foobar.somethingelse": "Isn't this neat?"
}

One could argue that ".foobar.somethingelse" should, because it starts with a ".", be treated as a relative namespace and therefore its full namespace would be "com.example.foobar". This seemed to me to be too clever by half and so I decided not to do it.

Why not use namespace prefixes ala XML?

To use namespace prefixes we would have to add in an object whose only purpose was to define the prefixes. Then we would have to create a bogus root to contain that object. E.g.

{ bogusroot:
"json.namespacePrefixDefinitions":
{ "G":"http:\\/\\/goland.org\\/schemas\\/projectfoo",
"E":"http:\\/\\/example.com\\/schemas },
"realroot":
{ "G.specProposal" :
{ "title": "JSON Extensions",
"author": { "firstName": "Yaron",
"E.middleName":"Y",
"G.lastName": "Goland"
}
}
}
}

I used the "." character instead of the ":" character to separate the prefix from its local name because I think that's more readable in JSON. But in any case note the nastiness involved in using prefixes. The reason this doesn't seem as nasty in XML is because XML has those horrific violators of data consistency – attributes. Since no such animals (thankfully) exist in JSON we have to do violence to the object model in order to enable prefixes. Hence my rejection of prefixes.

Why use DNS when you could have used URLS?

I suppose the weasel answer is that "/" is illegal unescaped in a JSON string so you would end up with:

{ "http:\\/\\/goland.org\\/schemas\\/projectfoo\\/specProposal" :
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"http:\\/\\/example.com\\/shemas\\/middleName":"Y",
"http:\\/\\/goland.org\\/schemas\\/projectfoo\\/lastName": "Goland"
}
}

Which I personally think is just plain ugly. But I see nothing sacred in the JSON format and have given serious thought to proposing an alternative because I always get confused where to put curly brackets versus commas and I'm concerned that there is no way to annotate values in a string. In such an alternative I could make it legal to use "/" characters in names.

But in this case the real reason I avoided URLs is simplicity. I've never bought into the idea that a namespace name should necessarily be resolvable and I find reverse DNS names to be easier to deal with. Just personal taste I suppose. Apparently those years in Java land rubbed off on me.

Do we really need the compression mechanism?

I would dearly love to get rid of the compression mechanism because it complicates the processing model, makes it harder to pull objects out of a JSON structure, etc. But I'm deeply concerned that requiring each and every name to be fully qualified will make JSON both unreadable and unwriteable. E.g.:

{ "org.goland.schemas.projectfoo.specProposal" :
"org.goland.schemas.projectfoo.title": "JSON Extensions",
"org.goland.schemas.projectfoo.author": { "org.goland.schemas.projectfoo.firstName": "Yaron",
"com.example.schemas.middleName": "Y",
"org.goland.schemas.projectfoo.lastName": "Goland"
}
}

Um…. yuck. And that's without even discussing the byte bloat.

How can I define a name that has no namespace?

The proposal doesn't allow for that. I think that allowing for a mix of non-namespace qualified names and namespace qualified names just adds a lot of complexity for zero benefit so I require that all names be namespace qualified in order to be compliant with this proposal.

32 thoughts on “Adding Namespaces to JSON”

  1. Hi Yaron,

    I guess the question here is whether it’s more important that the naming system is compatible with Java or with XML (and XML applications that rely on XML namespaces, such as Atom or WebDAV). It seems to me that disallowing XML namespace URIs as JSON namespace names may make it hard to use when a service talks both JSON and XML.

    For the question about resolvability of names, have a look at Norman’s latest blog entry (at http://norman.walsh.name/2006/07/25/namesAndAddresses).

    Best regards, Julian

  2. Actually I hadn’t intended to be backwards compatible with either one. I explicitly view it as a non-goal to worry about how to re-use XML or Java’s namespaces.

    The infoset != JSON’s data model so any attempt to map the two will either require crippling XML or making JSON as complex as XML. Both choices are unacceptable so I just don’t worry about it.

    As for Norm’s article, I’m going to stay away from this issue because I already have more things to deal with then I want to. :)

  3. Two comments: Since we’re throwing the towel in on XML to a degree, can we also through the towel in on schema and declaring strong types for message formats? It’s always seemed a bit odd to have a schema for something that was also supposed to be self describing.

    second, do you really need to intermingle the data from different namespaces? It’s not like this is code. Could you do something like :


    { "localfield1": "foo",
    "localfield2": "bar",
    "imports": { "namespace1": {"myimportedfield1":"baz"},
    "namespace2": {"myotherimportedfield":"bab"}
    }
    }

    Admittedly, I didn’t give this a lot of thought. You’d probably want a more qualified name than “imports”, but seems like this syntax is fairly compact and you get the bonus of pretty simple processing model.

    Just a thought.

  4. I’m not sure I know what you mean by ‘strong types’. But the article I’ll be releasing in the near future on a schema for the infoset I published today will probably give you a chance to clarify.

    As for ‘imports’, this format would make it difficult to directly include fields from other namespaces. They would also be ‘children’ and forced off to a corner (I suppose that makes them bad children?).

    But it’s a fair point, I wonder how many schemas meaningfully include multiple namespaces?

  5. don’t think of it as pushed off into the corner; think of it as neatly packaged and prominently labeled.

    I guess I’m wondering why you want from a schema? What does it give you that “self-describing” doesn’t give you already? Are you trying to specify a contract? Are you trying to get your messages into a strongly typed language?

    If you’re going to have a schema, why not skip the overhead of self-describing?

  6. An outstanding question! One I wish I heard more (although I was pretty happy when my dev lead made the same point). Why bother with self describing if you are going to have a schema?

    There are a couple of answer. My personal favorite is – ease of programming and debugging.

    If you had asked me 11 or 12 years ago if self describing data was important I’m not sure I’d say yes. I’d probably say ‘naah, perf matters more, encode it, compress it, whatever, so long as your debugging tools can decompress it, who cares?’

    Then HTTP happened. I have to admit that I wouldn’t have believed it if I hadn’t lived through it. But it turns out that having protocols and data formats that are trivial to produce and trivial to write code to consume and to debug on the fly (ahh the power of Telnet, and yes, I, like others, have debugged HTTP via Telnet, I’m probably just old though) makes it remarkably easy for people to adopt your protocol/data format.

    By way of comparison lets look at something like EDI or even LDAP. Both use heavily schema’d binary protocols. The good news is that they are very efficient. The bad news is that you can’t use printf (or write or whatever your favorite text output is) to produce them, forget scanf for consuming them and lord help you if you ever need to debug them. Yes, you can get special tools that will decode them on the fly, if you have the schema. If you don’t have the schema (e.g. you are debugging a network and not sure what’s flying around) then you’re completely screwed, all you have is a collection of position based fields. Good luck.

    Not being able to easily write, consume or debug a protocol or data format turns out to put a huge barrier to entry between the programmer and the format. Which is why I believe that HTTP and even XML took off while LDAP and EDI live in very special closed off communities. Of course given how much money goes over EDI maybe I should pray for failure but that’s a separate issue.

    The other reason that self describing formats really, really matter is extensibility. Extensibility covers the stuff that isn’t in the schema. In theory you can fix this by designated extension zones. LDAP does something like this. They hacked their binary format to allow people to add in arbitrary properties. But only in certain places. Of course if the best place for your extension isn’t the place people thought would be best then you’re screwed. In theory this doesn’t matter since you can use links to hook things together but if ‘you can just link it together’ really worked we would all be using RDF. It turns out that location matters for understanding so you really want to be able to put your extensions next to the thing being extended. Not in some random location that someone decided extensions might make sense in.

    And, here’s the best part, LDAP’s extension properties are self describing name/value pairs. Why self describing? Because it turns out that multiple different groups want to simultaneously expand the same entry. The schema is useless here because the groups aren’t coordinating so there is no central schema so without a self describing format (and some kind of name collision avoidance mechanism, hence namespaces) everyone ends up with an unreadable mess (bytes 15-30 are mine damnit! No, they are mine!). So if we ever want to be able to expand messages after they are released (and if the HTTP taught us anything it’s don’t paint yourself into a corner) then we need self describing formats, even when we have schemas.

    In fact my own interest in schemas is primarily for marshaling and extensibility. Specifically, I want to use a schema so a programmer can say “This is the message I’m expecting and here’s how I want to rip it into memory” and I also want the programmer to be able to use the schema as a filter to rip out any extensions they don’t support so their code doesn’t blow up if they are sent a message with an extension that didn’t exist when the code was written.

    So, to summarize (this really should be a blog article) I want a self describing format even when I have a schema because it makes it easier to write data, read data, debug data and extend data.

  7. Yes, I was trying to draw you out a bit. No question that self-describing is a big win. I guess I’m more inclined to go the “no schema” route, just because it seems like expressing a message format that is just extensible enough and no more is a bit of a fool’s errand. Neither the producer or the consumer can really do it alone.

    In a previous life, I did a fair amount of work with ASN.1 (including working on the spec for an ASN.1 based standard). In my original comment, I was proposing designated extensions zones for JSON.

    if it didn’t violate my requirement of assuming nothing of the receiver, I’d also propose mustUnderstand for imported fields.

    (I suppose you could kludge one by make a wrapper object in another namespace, adding your field, and making the original message a field on the wrapper.)

  8. The schema is a helper object, it should never be required, but it can be useful.

    If you are using designated extension zones then why have a structured data format at all? Why not just have flat name/value pairs and use links for structure (ala RDF)? If structure matters then placing your extensions in the right place in the structure matters. Extension zones, by definition, allow extensions only in pre-approved places. So if the right place for your extension is somewhere other than the blessed extension zone then your SOL. I think you either have to pick a RDF style approach or you have to allow arbitrary extensibility, but you can’t have it both ways.

    I actually am a big fan of the wrapper object approach. If you are changing something in a manner that isn’t backwards compatible (and therefore needs a must understand to point this out) then you should change the parent element because you are breaking the contract.

  9. I came across your page searching for JSON schema conventions, not namespaces, but anyway just a note:

    You wrote, ‘”/” is illegal unescaped in a JSON string’. I’m looking at RFC 4627 which lists this ABNF rule:

    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

    ‘/’ is %x2F, which falls in the second legal range for unescaped string characters. Similarly, http://www.json.org lists legal unescaped characters as:

    any-Unicode-except-“-or-\-or-control

    ‘/’is neither ‘”‘ nor ‘\’ nor a control character, so it is also legal by this definition.

  10. I didn’t read through everything but why not have the namespace thing as
    { “namespace” : { “G” : { “name” : “org.foo”, “src” : “http:\/etc” }, “data” : null, “done” : false }
    then compliant JSON parser can change the “done” value to true if it uses namespacing, if it’s false it’s down to the script to accommodate for it.

    1. What happens when we mix namespaces? E.g. maybe first name and last name are from standard namespace but address is not? In that case we would have to write JSON parsers that could handle seeing a namespace declaration just about anywhere. That strikes me as really complex to handle.

  11. var j = JSON.parse(jsonstring);
    if (j.done) {
    // Continue with script
    } else { j = null; alert(‘Cannot use JSON namespacing! Ending script.’); }

  12. oops forgot about mixing, that G is for the mixing e.g
    { “ns.G” : { “fname” : “Lee”, “sname” : “Shallis”, “ns.A” : { “line1” : “## Flat Number”, “line2” : “## Street Number” } }, “G.age” : 21 }

  13. 3rd try (not seeing my post show up).
    { “ns.std” : { “fname” : “Aoi”, “Madeup”, “ns.p” : { “aline1” : “67 Example Road”, “city” : “Example”, “zip” : “ee10 1ee” }
    ns would refer to the namespace object and could be used like this as well: { “namespace” : {
    “std” : { “name” : “org”, “src” : “etc” }, “p” : { “name” : “std.foo” },
    “e” : { “name” : “etc.foo”, “src” : “etc” }

  14. Right but this means that on every element the parser has to constantly be on the lookout for namespaces. This blocks the use of JSON parsers that just translate to some local class, they need a whole extra layer of logic. I’m not sure the feature is worth the trouble. If we just use globally unique names then we work both with namespace and non-namespace aware parsers. They both ‘see’ exactly the same structure with the same values.

  15. it’s not really that much of a problem, e.g
    if (name.indexOf('_') >= 0) {
    t = json.namespace[name.split('_')[1]].ns; // ns &= global[name];
    for (i = 0;i < t.names.length;i++) {
    tmp = t.names[i];
    if (!value[tmp] || typeof value[tmp] !== typeof t[tmp]) { value[tmp] = 'done;' + t[tmp]; }
    else { value[tmp] = 'done;' + value[tmp]; }
    }
    } else {
    tmp = new RegExp('^done\\;');
    if (tmp.test(value)) { value = value.slice(value.indexOf(';') + 1); }
    // continue code
    }
    Which most likely will only add a 2 – 5 seconds to an average JSON structure. I used _ instead of . so old JSON parsers won’t create and use a sub object instead of a bunch of properties

  16. In the contexts I am looking at JSON I am usually parsing it in a language other than ECMAScript. In fact, I’m usually parsing it using some off the shelf JSON translator and that translator is usually using reflection to build some native class. Unless I’m going to edit all of those translators in all of those languages to become namespace aware then this approach doesn’t work.

  17. That wouldn’t be necessary, the JSON standard states that all JSON parsers should provide the option of using a function to do some last second checks/changes on the data, you can just make your function set up the namespaces and then null out the original namespace data so that the JSON parser does not go further into unneeded section (which is what I intended with the above function.
    I’m only using JavaScript because I haven’t learnt any others yet.

    1. I couldn’t find anything in RFC 4627 about clean up, but I was just scanning. But it really doesn’t matter. The key is that if we use DNS names then all JSON parsers just work. If we use anything else than anyone wanting to use the JSON has to do some pre-parsing which given the nature of the JSON community is, I suspect, too much of a hurdle. I’m not even sure it’s worth the effort. Experience from XML teaches that people will ignore namespaces or implement them wrong. By just using simple strings we avoid all the trouble.

  18. My point was that provided the JSON parser does not support namespacing correctly then the optional function that can be used to convert strings to data objects and so on can be used to do self implementation or a standard function can be made that calls the user function once after correcting what the JSON parser did not.
    eg. ajax(‘example.json’, standardfunction(name, value, userfunction));

    1. And if a programmer is using Java or .Net or what have you where they receive the JSON directly and feed it to their JSON parser expecting to get out nice Java/.NET/etc. classes? Do we tell all of them that they have to build a preprocessor first? That doesn’t sound like a successful strategy to me. The more you make the devs understand the more likely they are to get it wrong. We saw this all the time in XML where developers would ignore the namespace feature and instead hard code in the namespace tags thus resulting in all sorts of fun chaos. For systems to be robust they must ‘fail safe’ in the face of predicable dev behavior.

  19. It should still come out fine eg:
    j = JSONParser(JSONString, NSJSON(name, value, USERJSON))
    j = JSONParser(JSONString, USERJSON(name, value))
    Top for namespaced JSON and bottom for normal JSON.
    NSJSON would do a quick adaption of the JSON so that USERJSON can treat it as normal. Either way they get normal classes return to j.

    1. I think we are talking past each other. I understand that the JSON can be transformed. But it is my personal belief (I can’t prove I’m right) based on what I saw with XML that most people will screw up the namespace translation and as a consequence the namespace mechanism will fail in practice even though it should work in theory. That’s why I went for globally unique names instead since they require zero additional code, they just work, no effort required.

  20. The point in namespaces is to be templates that say what data is permitted and what value to give data that is not defined. From my understanding of what your trying to say is that the implementation should be changed to global objects rather than templates.

  21. That might be easy but I don’t see how that would be better than the original purpose of namespaces.
    People who use them badly are just amateurs and likely won’t have a server load big enough to be concerned with doing it right.

    1. Because they have no choice but to get it right. If they want to parse incoming data then they have to parse it with the full name. No shortcuts would work (other than position based which yes, people will do). With namespaces people just ignore the namespace and parse the names directly with the result that if you have name collision things just fail. By having every name be unique this isn’t an issue.

  22. How would it fail if the object/property and namespace had the same name? For a name to be identified as a namespace a dot character or _ in my suggestion would first need to present which a normal name does not have.

    1. Because people would just ignore the . and the _. The moral equivalent happened with XML namespaces.

  23. Namespace ids can be used, if the object isn’t created then the namespace id will not be loaded into the parent object. Even if the programmer doesn’t test for the existence v.NAME the script will fail to continue because it has no data to work with. e.g:
    ns : { g : { id : ‘person’, name : ‘org.foo’, src : ‘URL’ } }, data : { ns_g : { fname : ‘Lee’ } } // ns_g would tell the parser to create a object called person in data using ns_g data as the default properties and their corresponding values
    // then…
    if (json.person) {
    // continue
    } else {
    // root is not data
    } // for arrays the parser would wrap an object around person before adding it to the array

    1. All of which makes the JSON 10x more painful to deal with. How much easier it would be to just parse the JSON in as is, no translations and be done with it. Because names are DNS they are all unique and even human readable. With the containment proposal every path gets long and longer. But oh well, I think we can just agree to disagree.

  24. Yes, being a programmer I completely disagree with you because I can already envision the code which is not all that big and can simply make use of the user function that current parsers already permit.
    All the developer would have to do is check for the namespace node on the root in his own function which actually makes it easy for him/her to bypass wrong json and use a variable string to capture the name correctly, for instance:
    t = ”;
    if (root.namespace) { t = ‘ns_’; }
    switch (name) {
    case ‘plist’: l = obj[name];
    for (i = 0…) { o = l[i][t + ‘person’];… }
    } // not hard and the middle function would simply be a modified version of the one further above.

Leave a Reply

Your email address will not be published. Required fields are marked *