Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

XML Package Design

This page will contain an evolving proposal for a coherent XML design within Tango.

Goals

On the highest level, there are two ways to process XML, as a stream or as a complete document/file. Since these are useful in different situations, and may have different performance characteristics based on the situation at hand, Tango should have an interface for both.

  • Interfaces should be easy to use
  • Interfaces should leverage the power of D
  • Interfaces should follow the same patterns as elsewhere in Tango
  • Interfaces should accommodate for usage with XPath.

Inspirations

Historically, DOM has been used for the complete document situation, and SAX for the streaming variation. Both of these are cumbersome to implement, and also to use. Also, both can be implemented on top of another framework, and thus we should focus on getting the groundwork well done first. Another point here, is that DOM showed up prior to XML being fully standardized, and thus isn't necessarily what one would use after some experience with XML.

VTD-XML is a package that use binary indexing to parse XML and provide the content. The exact method there seems to be patented somehow, but the idea of indexing shouldn't be patentable. This package is for full document processing.

XPP3 is a XML pull parser, a parsing method that is considered faster and lighter than the other variants.

StAX is also a pull parsing library/specification, especially directed towards streaming.

VTD claims huge gains in speed when compared to Xerces (SAX?), but I've seen no benchmarks towards the other libraries.

A nifty in-language grammar can be found here - maybe we could pull off something similar to this?

A blog entry from a guy trying to pull off something that may be close to what we want to do - here

Design suggestion

It is obvious that an iterator pattern should be used (and with XML this involves iterating over one (or several) of tokens, elements, tags and more). Further on, the user must be able to extract information from the current cursor position, move to other positions (if the user knows enough about the document), edit the document (this is not possible with the API on xmlpull.org apparently).

An API suggestion should follow below ...

Discussion

Can be done here, or in the relevant post in the forum. The design can be changed as suggestions comes (first stage) and eventually are agreed on (second stage).

Delta between Tango's Document and W3C DOM APIs

This chart maps the API calls between Tango and DOM

How do I...TangoW3C DOMJava DOM
parse an xml document?auto doc = new Document!(char); doc.parse (content);N/A (differs between languagesDocumentBuilderFactory?. newInstance(). newDocumentBuilder(). parse(content);
start an xpath style query?doc.querydocument.createExpression()XPathFactory. newInstance(). newXPath();
create a new document?auto doc = new Document!(char)document.getImplementation().createDocument()DocumentBuilderFactory?. newInstance(). newDocumentBuilder(). newDocument();
add an xml prolog to a new document?doc.headerdocument.appendChild( document.createProcessingInstruction( "target", "instruction"));same as DOM, but usually done at serialization time
add a new element to the doc? doc.element("foo");elem = doc.createElement("foo"); doc.appendChild(elem); elem = doc.createElement("foo"); doc.appendChild(elem);
add an attribute?elem.attribute(prefix, localName, value);elem.setAttributeNS(uri, name, value);elem.setAttributeNS(uri, name, value);