WebDAV, DASL, XQUERY and XPATH 2.0
Sunday April 18th 2004, 12:00 am
Filed under: Internet Protocols

The Web's slow but inexorable movement from a read only to a collaborative environment is increasing WebDAV's success. But WebDAV still has a serious functional outage – search. The DASL community has been keeping hope alive by continuing to work on a search grammar for WebDAV. But much as WebDAV adopted XML both to solve real problems and to ride on the coat tails of XML's success, so DASL could solve a number of serious technical issues and increase its own visibility and leverage the excitement and investment in the XPATH/XQUERY community if it adopted a profile of XPATH 2.0 as its basic search grammar. In the article below I discuss some of the details of how DASL could use XPATH.

The Background

WebDAV is a very cool technology that I provided a nice, short, summary of in an article I wrote a while back. But, to quickly recap the article, WebDAV is a HTTP based protocol for accessing data. It defines how to surf the data's hierarchy, how to add/edit/copy/move/delete both folders and files in that hierarchy and how to set/get properties belonging to folders/files in the hierarchy. It works great for accessing e-mail, file systems, directories, databases, whatever. It's a standard, it's widely adopted and it's very nifty.

The Problem

WebDAV has a weakness that has been causing me some problems as I promote its use – Search.

The DASL effort, to add search to WebDAV, has been going on for some time now and the latest draft, dated January of this year, is keeping hope alive. But there is a technique that was used to outstanding effect with WebDAV that I think could really help DASL get a lot of community interest and support – ride on someone else's coat tails. In the case of WebDAV the coat tails we rode on were XML's itself. WebDAV was the first protocol I'm aware of to use XML and this got us tons of free interest and publicity. Thankfully XML added real value to WebDAV so the standards committee can be proud of its pioneering efforts.

The Pitch

Today there are new coat tails that I think DASL could ride on – XPATH/XQUERY. After more years than anyone cares to count it looks like XPATH/XQUERY will finally finish sometime this decade. Meanwhile everyone is already investing in XQUERY. At last count there were 29 or so implementations, several of which are open source. JSR 225, which will standardize an API for XQUERY, was founded by Oracle and IBM and is supported by BEA and Sun. BEA, Microsoft and Oracle all have XPATH/XQUERY implementations. The XPATH/XQUERY band wagon is clearly one worth being on and one that could really help pump up interest in DASL. And, much as with XML's contribution to WebDAV, XPATH/XQUERY has a ton of real value to add to DASL.

I realize that XPATH/XQUERY is significantly more complex than DASL's current basic query, but given the enormous number of existing implementations , including open source ones, I do not believe that the complexity of XPATH/XQUERY need be a burden on the WebDAV community. Equally importantly, XPATH/XQUERY has already figured out a lot of very hard problems regarding how to search XML. By using XPATH/XQUERY, the DASL community would avail itself of the enormous experience built right into XPATH/XQUERY.

The Details

What I am suggesting is that DASL make one big change, replace its basic grammar with some profile of XPATH 2.0. This, of course, begs the question of exactly what a XPATH 2.0 expression submitted in a WebDAV SEARCH method would be querying. After all, XPATH expects to be applied against something, what is the 'something' that is would be searching?

I propose that what XPATH be applied against is a representation of the WebDAV resource space as a XML infoset. A XML infoset is essentially an abstract representation of the XML object model, it consists of a hierarchy whose entries are elements, attributes, text, etc. It allows one to model XML data independent of the actual XML serialization.

So what my proposal boils down to is creating a virtual XML document, represented by the infoset, whose contents are the names, properties and URIs of the resources within a WebDAV search space. The XPATH would then be applied against that virtual document. A XML serialization of the virtual document, something which would never be created in the real world and is only included here to make it easier for the reader to visualize what I'm talking about, could look like:

<root xmlns="dav:infoset">
<resource>
<name>http://example.com/</name>
<properties>
<displayName>example.com root</displayName>
<resourcetype><collection/></resourcetype>
</properties>
<children>
<name>http://example.com/index.html</name>
</children>
</resource>
<resource>
<name>http://example.com/index.html</name>
<properties>
<getContentType>text/html</getContentType>
<resourcetype/>
</properties>
<representation>
<properties>
<getcontentType>text/html</getcontentType>
<getcontentLanguage>en</getcontentLanguage>
</properties>
<bodyURI>http://example.com/index.html;lang=en</bodyURI>
</representation>
<representation>
<properties>
<getcontentType>text/html</getcontentType>
<getcontentLanguage>de</getcontentLanguage>
</properties>
<bodyURI>http://example.com/index.html;lang=de</bodyURI>
</representation>
</resource>
</root>

Queries would be executed against this infoset as if itreally existed. In reality the WebDAV server would take theincoming query and translate it into a form that matched how thesystem actually stored information. It bears repeating that theinfoset is a conceptual not an actual entity. I'm notsuggesting that WebDAV servers go create a huge XML document withall their data in it and then run XQUERY against it.

Queries that wanted to query both the values of the name/property space as well as the contents of actual resources could use the document-uri function. This function returns the contents of a resource it is pointed at and makes it available for querying. See the Q&A in the appendix for a discussion of why I choose this approach.

At some point full XQUERY support would be needed but I would remind the reader of an old design rule – The spec is done where there is nothing left to cut. A standardized DASL specification with the current query language negotiation features and a 'basic grammar' based on XPATH 2.0 would be very useful and well worth standardizing. One could then have a later extension spec that added in full XQUERY support as a new pluggable query language option.

The Conclusion

As the Web is finally, slowly, moving from a read only to a truly collaborative environment WebDAV is more and more finding an audience who realize that WebDAV solves real problems. Unfortunately search is still a chink in WebDAV's armor. But DASL can turn this problem into an opportunity if it adopts XPATH as its basic grammar and so allows itself to leverage the excitement and investment available via the XPATH/XQUERY community.

Appendix – Some thoughts on the WebDAV Infoset

Below is a RelaxNG compact schema that describes what an infoset for the WebDAV resource space could look like:

namespace dav = "dav:infoset"
start = element dav:root { Contents* }
Contents = Collection | Resource

Resource = element dav:resource { Name+, Properties, Representation* }
Name = element dav:name { xsd:anyURI }
Properties = element dav:properties { RTNoCol & AnyButRT* }
RTNoCol = element dav:resourcetype { (element * - dav:collection { any* })* }
Representation = element dav:representation { RepProperties, BodyUri+ }
RepProperties = element dav:properties { anyElement* }
BodyUri = element dav:bodyURI { xsd:anyURI }

Collection = element dav:resource { Name+, ColProperties, Representation*, Children }
ColProperties = element dav:properties { TypeCol & AnyButRT* }
TypeCol = element dav:resourcetype { element dav:collection { anyElement* } }
Children = element dav:children { Name+ }

AnyButRT = element * - dav:resourcetype { any* }

any = anyAttribute | text | anyElement
anyAttribute = attribute * { text }
anyElement = element * { (anyAttribute | text | anyElement)* }

Q&A about the Infoset Idea

Q. Why is the infoset so flat?
A. My original design was hierarchical where resource elements could be contained inside of collections. But the HTTP URL hierarchy is just one particular view on a WebDAV space. There are lots of other possible views, as the bindings spec enables. So rather than get into the question of 'whose view' is being seen I though it easier to just make the whole thing flat. I also suspect that the flat hierarchy will make it easier for implementers to map between queries and implementation. But I could be completely wrong in which case perhaps the structure should be hierarchical.

Q. How do I search on the content of a resource?
A. XPATH includes a document-uri function call whose response is the data recorded at that URI which themselves become part of the data available to XPATH to examine. So if one wants to search on the contents of the body one just uses the document-uri function call. Originally I had thought of just including the content of the resources directly into the infoset model but this is probably a bad idea. As Michael Rowley pointed out to me, it makes simple queries about names and properties more complex as one has to explicitly exclude the body content. Perhaps even more importantly it will probably lead to people making really expensive queries. Someone looking for a resource named foo.htm is quite likely to do a search on "//foo.htm". The results of this query on an infoset that directly includes the contents of each and every resource is left as an exercise for the reader.

Q. Why is the bodyURI element's value different than the name element's?
A. A single resource can have multiple representations in various languages, encodings, etc. In that case we need unique URIs for each and every representation so it can be uniquely addressed. Keep in mind that the URI can be completely bogus. For example, the URI could be "data:id239428390432" which is some unique identifier. The point is that the URI is only meant to be used in the document-uri XQUERY function although it is good hygiene to use a URI like 'data' if the URI is not normally resolvable. I suspect, btw, that in the majority of cases the bodyURI will never actually be generated and instead will only be implicitly used in a query.

Q. Isn't the Children element unnecessary?
A. In theory one can reverse engineer the contents of collections by searching through the names in the infoset. E.g. all the children of foo should have names that match foo/*. But specs like bindings, which enables one to make arbitrary URIs members of collections or ordering, which allows one to specify an ordering for the children of a collection, illustrate the need for an explicit membership list.

Q. What if a resource has multiple names?
A. That's why the name element is allowed to show up more than once.

Q. Why did you use properties in the representation element instead of just using HTTP headers?
A. I have an agenda to see WebDAV made available via web services. It would take no great effort to turn WebDAV methods into WSDL operations that could then be mapped to SOAP. So this motivates me to stay away from explicit HTTP headers. Since WebDAV already provides mappings of the major HTTP headers into the WebDAV property space, e.g. getcontenttype, I decided to just leverage that existing mapping.



3 Comments so far

Great proposal! I'm curious, have you gotten much interest in this topic? I have some immediate and real-world needs for this exact thing.

Regards,
Eric

Comment by Eric Hanson 11.10.04 @ 6:30 pm

Julian Reschke, the man who is pretty much single handedly keeping DASL alive, thought the idea interesting and had done some work in the same area himself but ultimately rejected it for the base draft.

His argument was that even a paired down XQUERY was more than most people could handle implementing. I don't agree with him. I think hooking up an existing open source XQUERY engine would be a lot less painful then having to write an entire query system from scratch. I also think that customers would be a lot more willing to invest in using DASL if they knew that the investment was leverage-able in other areas. Learning DASL's basic query means you know DASL's basic query. Learning a XQUERY based query systems gives you knowledge you can leverage elsewhere.

Technically it's not a big deal since DASL is carefully designed to allow for pluggable query languages and even specifies how to discover what languages are supported. So it wouldn't be a big deal to create a XQUERY for DASL sub-set and publish it as a spec that could then be used in the DASL framework.

But I think DASL will not gain as much interest as it could have had it gone out the door with a sub-set of XQUERY instead of inventing its own query language.

Still, if it wasn't for Julien and a few other people there wouldn't be a DASL at all so I'm not going to complain.

Comment by Yaron 11.11.04 @ 9:28 am

Hi,

some more comments…

To clarify: I absolutely agree that this (or a similar) approach would give both a powerful query language and a compact spec. However, I don't believe that currently many people will be able to implement it, thus DASL will still require a much simpler default grammar if we want to achieve interoperability. Of course, the more time passes without DASL/DAV:basic-search progressing, the less important this will be. Maybe at this point DASL should be stripped down to the bare minimum (no sorting, no typing, no query grammar discovery) and be published as “Experimental”… Feedback (and help) appreciated… (mailing list: www-webdav-dasl@w3.org).

The latest version of the bindings specification is draft-ietf-webdav-bind-10 () which is currently in working group last-call. People who are interested in this spec should review it now; it's not going to change substantially anymore.

The Ordered Collections Protocol has been published in December 2003 as RFC3648 ().

Comment by Julian Reschke 01.22.05 @ 6:41 am



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



nine + 1 =