The Web's slow but inexorable movement from a read only to a collaborative environment is increasing WebDAV's success. But WebDAV still has a serious functional outage – search. The DASL community has been keeping hope alive by continuing to work on a search grammar for WebDAV. But much as WebDAV adopted XML both to solve real problems and to ride on the coat tails of XML's success, so DASL could solve a number of serious technical issues and increase its own visibility and leverage the excitement and investment in the XPATH/XQUERY community if it adopted a profile of XPATH 2.0 as its basic search grammar. In the article below I discuss some of the details of how DASL could use XPATH.

The Background

WebDAV is a very cool technology that I provided a nice, short, summary of in an article I wrote a while back. But, to quickly recap the article, WebDAV is a HTTP based protocol for accessing data. It defines how to surf the data's hierarchy, how to add/edit/copy/move/delete both folders and files in that hierarchy and how to set/get properties belonging to folders/files in the hierarchy. It works great for accessing e-mail, file systems, directories, databases, whatever. It's a standard, it's widely adopted and it's very nifty.

The Problem

WebDAV has a weakness that has been causing me some problems as I promote its use – Search.

The DASL effort, to add search to WebDAV, has been going on for some time now and the latest draft, dated January of this year, is keeping hope alive. But there is a technique that was used to outstanding effect with WebDAV that I think could really help DASL get a lot of community interest and support – ride on someone else's coat tails. In the case of WebDAV the coat tails we rode on were XML's itself. WebDAV was the first protocol I'm aware of to use XML and this got us tons of free interest and publicity. Thankfully XML added real value to WebDAV so the standards committee can be proud of its pioneering efforts.

The Pitch

Today there are new coat tails that I think DASL could ride on – XPATH/XQUERY. After more years than anyone cares to count it looks like XPATH/XQUERY will finally finish sometime this decade. Meanwhile everyone is already investing in XQUERY. At last count there were 29 or so implementations, several of which are open source. JSR 225, which will standardize an API for XQUERY, was founded by Oracle and IBM and is supported by BEA and Sun. BEA, Microsoft and Oracle all have XPATH/XQUERY implementations. The XPATH/XQUERY band wagon is clearly one worth being on and one that could really help pump up interest in DASL. And, much as with XML's contribution to WebDAV, XPATH/XQUERY has a ton of real value to add to DASL.

I realize that XPATH/XQUERY is significantly more complex than DASL's current basic query, but given the enormous number of existing implementations , including open source ones, I do not believe that the complexity of XPATH/XQUERY need be a burden on the WebDAV community. Equally importantly, XPATH/XQUERY has already figured out a lot of very hard problems regarding how to search XML. By using XPATH/XQUERY, the DASL community would avail itself of the enormous experience built right into XPATH/XQUERY.

The Details

What I am suggesting is that DASL make one big change, replace its basic grammar with some profile of XPATH 2.0. This, of course, begs the question of exactly what a XPATH 2.0 expression submitted in a WebDAV SEARCH method would be querying. After all, XPATH expects to be applied against something, what is the 'something' that is would be searching?

I propose that what XPATH be applied against is a representation of the WebDAV resource space as a XML infoset. A XML infoset is essentially an abstract representation of the XML object model, it consists of a hierarchy whose entries are elements, attributes, text, etc. It allows one to model XML data independent of the actual XML serialization.

So what my proposal boils down to is creating a virtual XML document, represented by the infoset, whose contents are the names, properties and URIs of the resources within a WebDAV search space. The XPATH would then be applied against that virtual document. A XML serialization of the virtual document, something which would never be created in the real world and is only included here to make it easier for the reader to visualize what I'm talking about, could look like:

<root xmlns="dav:infoset">
<resource>
<name>http://example.com/</name>
<properties>
<displayName>example.com root</displayName>
<resourcetype><collection/></resourcetype>
</properties>
<children>
<name>http://example.com/index.html</name>
</children>
</resource>
<resource>
<name>http://example.com/index.html</name>
<properties>
<getContentType>text/html</getContentType>
<resourcetype/>
</properties>
<representation>
<properties>
<getcontentType>text/html</getcontentType>
<getcontentLanguage>en</getcontentLanguage>
</properties>
<bodyURI>http://example.com/index.html;lang=en</bodyURI>
</representation>
<representation>
<properties>
<getcontentType>text/html</getcontentType>
<getcontentLanguage>de</getcontentLanguage>
</properties>
<bodyURI>http://example.com/index.html;lang=de</bodyURI>
</representation>
</resource>
</root>

Queries would be executed against this infoset as if itreally existed. In reality the WebDAV server would take theincoming query and translate it into a form that matched how thesystem actually stored information. It bears repeating that theinfoset is a conceptual not an actual entity. I'm notsuggesting that WebDAV servers go create a huge XML document withall their data in it and then run XQUERY against it.

Queries that wanted to query both the values of the name/property space as well as the contents of actual resources could use the document-uri function. This function returns the contents of a resource it is pointed at and makes it available for querying. See the Q&A in the appendix for a discussion of why I choose this approach.

At some point full XQUERY support would be needed but I would remind the reader of an old design rule – The spec is done where there is nothing left to cut. A standardized DASL specification with the current query language negotiation features and a 'basic grammar' based on XPATH 2.0 would be very useful and well worth standardizing. One could then have a later extension spec that added in full XQUERY support as a new pluggable query language option.

The Conclusion

As the Web is finally, slowly, moving from a read only to a truly collaborative environment WebDAV is more and more finding an audience who realize that WebDAV solves real problems. Unfortunately search is still a chink in WebDAV's armor. But DASL can turn this problem into an opportunity if it adopts XPATH as its basic grammar and so allows itself to leverage the excitement and investment available via the XPATH/XQUERY community.

Appendix – Some thoughts on the WebDAV Infoset

Below is a RelaxNG compact schema that describes what an infoset for the WebDAV resource space could look like:

namespace dav = "dav:infoset"
start = element dav:root { Contents* }
Contents = Collection | Resource

Resource = element dav:resource { Name+, Properties, Representation* }
Name = element dav:name { xsd:anyURI }
Properties = element dav:properties { RTNoCol & AnyButRT* }
RTNoCol = element dav:resourcetype { (element * - dav:collection { any* })* }
Representation = element dav:representation { RepProperties, BodyUri+ }
RepProperties = element dav:properties { anyElement* }
BodyUri = element dav:bodyURI { xsd:anyURI }

Collection = element dav:resource { Name+, ColProperties, Representation*, Children }
ColProperties = element dav:properties { TypeCol & AnyButRT* }
TypeCol = element dav:resourcetype { element dav:collection { anyElement* } }
Children = element dav:children { Name+ }

AnyButRT = element * - dav:resourcetype { any* }

any = anyAttribute | text | anyElement
anyAttribute = attribute * { text }
anyElement = element * { (anyAttribute | text | anyElement)* }

Q&A about the Infoset Idea

Q. Why is the infoset so flat?
A. My original design was hierarchical where resource elements could be contained inside of collections. But the HTTP URL hierarchy is just one particular view on a WebDAV space. There are lots of other possible views, as the bindings spec enables. So rather than get into the question of 'whose view' is being seen I though it easier to just make the whole thing flat. I also suspect that the flat hierarchy will make it easier for implementers to map between queries and implementation. But I could be completely wrong in which case perhaps the structure should be hierarchical.

Q. How do I search on the content of a resource?
A. XPATH includes a document-uri function call whose response is the data recorded at that URI which themselves become part of the data available to XPATH to examine. So if one wants to search on the contents of the body one just uses the document-uri function call. Originally I had thought of just including the content of the resources directly into the infoset model but this is probably a bad idea. As Michael Rowley pointed out to me, it makes simple queries about names and properties more complex as one has to explicitly exclude the body content. Perhaps even more importantly it will probably lead to people making really expensive queries. Someone looking for a resource named foo.htm is quite likely to do a search on "//foo.htm". The results of this query on an infoset that directly includes the contents of each and every resource is left as an exercise for the reader.

Q. Why is the bodyURI element's value different than the name element's?
A. A single resource can have multiple representations in various languages, encodings, etc. In that case we need unique URIs for each and every representation so it can be uniquely addressed. Keep in mind that the URI can be completely bogus. For example, the URI could be "data:id239428390432" which is some unique identifier. The point is that the URI is only meant to be used in the document-uri XQUERY function although it is good hygiene to use a URI like 'data' if the URI is not normally resolvable. I suspect, btw, that in the majority of cases the bodyURI will never actually be generated and instead will only be implicitly used in a query.

Q. Isn't the Children element unnecessary?
A. In theory one can reverse engineer the contents of collections by searching through the names in the infoset. E.g. all the children of foo should have names that match foo/*. But specs like bindings, which enables one to make arbitrary URIs members of collections or ordering, which allows one to specify an ordering for the children of a collection, illustrate the need for an explicit membership list.

Q. What if a resource has multiple names?
A. That's why the name element is allowed to show up more than once.

Q. Why did you use properties in the representation element instead of just using HTTP headers?
A. I have an agenda to see WebDAV made available via web services. It would take no great effort to turn WebDAV methods into WSDL operations that could then be mapped to SOAP. So this motivates me to stay away from explicit HTTP headers. Since WebDAV already provides mappings of the major HTTP headers into the WebDAV property space, e.g. getcontenttype, I decided to just leverage that existing mapping.