|
|
||||||
|
|
![]() |
|
![]() |
|
||
|
|
||||||
Total DOMination
By Michael Floyd
Without question, the DOM is well documented in the W3C specification, in the documentation of various parser tools, and in the growing collection of XML books on the market. To that effect, the DOM is probably the best-documented component of XML (outside of XML itself), largely because the specification is stable and it's been available for roughly two years.
For all of the information about the DOM and its workings, however, there still seems to be little solid information on how you can use various components of XML. This month, I'd like to take a look at the DOM and show how you can perform actual tasks using it. Of course, the DOM encompasses far more than can be covered in a single article. Instead, I'll provide a brief overview of the DOM, then focus on the Core DOM that's the portion of the API applicable to XML. (I've provided a list of sites where you can get a more complete description of the individual API interfaces in " Online".) In particular, it's important to understand the Node, NodeList, and NamedNodeMap interfaces. It's also useful to be able to predict and compensate for the DOM's pitfalls.
DOM Overview
When an XML parser loads a document, it essentially scans the document, looks for elements, attributes, text, and so on and constructs a hierarchical tree based on those items. When the parser encounters, say, an element in the source document, it creates a new node in the tree containing a representation of that element. (This is somewhat of an oversimplification, but it's sufficient for this discussion.) Then the DOM's purpose is to provide a standard set of programming interfaces for accessing the nodes in this document tree, and for reading, writing, and modifying individual nodes or entire fragments of the tree.
It's important to note that at the time of this writing, the DOM Level 1 specification is the current standard. However, DOM Level 2 is near completion and the W3C has already begun work on a Level 3 specification. Because few tools fully implement DOM Level 2, let's focus on DOM Level 1. In DOM Level 1, the API can be organized into four objects: Document, Node, Nodelist, and NamedNodeMap.
Accessing the Document
For all of the promise that the DOM is platform and language "neutral," there are features of the DOM that tie it both to the implementation and language being used to access it. This is particularly true when it comes to instantiating a new DOM object. You see, before you can access an XML document via the DOM, you must create a new instance of a DOM object and populate that object with the XML document data. The problem is, there's no standard means for "bootstrapping" the DOM. That is, the standard doesn't specify how a DOM object should be created because object creation is language and platform specific. So, this part of the code will always be specific to both the language and the parser you're using. For example, using the MSXML software development kit and ECMAScript, the code would look something like that in Example 1.
The code creates a new instance of the MSXML parser and assigns it to a variable called
xmlDocument. The reference toMicrosoft.XMLDOM(called aprogID) points to the MSXML version 1 parser. Earlier this year, Microsoft announced a preview release of their version 2 parser, which may be available by the time this reaches print. Since that time, the company has also announced a third generation of the MSXML parser. The date on the final release of the third generation is still in question. However, If you install the latest preview release (see " Online"), you can access technology previews of these parsers. This is because they reside as DLLs and can be called separately by replacing theprogIDmentioned above withMSXML2.DOMDocumentfor version 2 andMSXML2.DOMDocument.3.0for version 3.Once you've instantiated a DOM object, you can use the Document interface to access details about your document. The Document object contains three attributes:
doctype,documentElement, andimplementation. The most important of these isdocumentElement, which lets you get the root element of your document as follows:
var docRoot = xmlDocument.documentElement;Once you have the root element you can traverse the document tree, query the properties of nodes, modify, replace, or remove nodes, create new nodes, and so on. These operations effectively modify your XML document.
The second property,
doctype, lets you return information about the DOCTYPE declaration of an XML document. In DOM Level 1, however, there's no way to modify an existing declaration or create a new one. This may be a serious drawback if you're generating XML on the fly and need to validate documents on the way. I've come up with a quick solution. Typically, you'll be working only with a limited number of DTDs, and often only with one. The strategy is to create a collection of stub XML documents containing a document prolog and a dummy root element. From there, you can read in the appropriate stub, modify the root element, and begin creating your document. It's not elegant, but it works.The Document interface also contains a set of methods, called factory methods, that let you create new node types including elements, text nodes, comments, processing instructions, CDATA sections, entities, and entity references. A list of some of the factory methods is presented in Table 1. A complete list is available at webtechniques.com.
The Node Interface
Because everything in the document tree is nodal, the Node interface provides the bulk of the routines with which you'll be working. Table 2 contains a list of some of the properties associated with the Node interface, and Table 3 presents methods available from the Node object.
Listing 1 shows how you can use the Node methods and properties to report on various features of an XML document called news.xml. When you run the program, the results are displayed in an alert box. Recall from Table 1 that the Document interface includes a collection of methods that let you insert nodes into the document tree, replace them, remove them, and so on. Once you dismiss the alert box, the program uses these factory methods to create two new elements,
newElem1andnewElem2. Then the Node interface'sinsertBeforemethod inserts thenewElem1element just before the first child of the root element.The next statement uses the
removeChildmethod to delete the last child from the document. And the following statement appendsnewElem2to the end of the document tree using theappendChildmethod. Finally, the program writes the result out a file called test1.xml. Note that if you're running this script in Internet Explorer 5, you may receive a file-permissions error. You can avoid this problem by giving your script an .hta file extension.Working with
NodeListsExample 2 is a fragment of a program that uses the
childNodesproperty to iterate through the children of the current node. The property returns aNodeListobject containing all children of the current node. If there are no children, you get an emptyNodeListobject. The content of the returned object is "live" in a sense. For example, changes to the node's children are immediately reflected in the data returned byNodeList. This is obviously a good thing, because you don't have to write additional code to update the list.The next point to note is that the
childNodesproperty returns a zero-based collection. And because aNodeListobject (and therefore thechildNodesproperty) is a collection, thechildNodesproperty lets you query its length, which lets you determine the number of items in the list. As mentioned, Example 2 uses this feature to iterate through the list using aforloop.Collections also support an
itemmethod, which lets you access the individual nodes in the collection. Example 2 assigns the item to a node variable, then calls theremoveChildmethod to remove it from the tree. Because the collection is live, the change is immediately reflected in the collection.Of course, you'll want to be careful not to delete the root element. Saving such a document will have unpredictable results. This also brings up another good point: You should always save changes out to a second file. Overwriting a perfectly good document with bad data can quickly ruin your day. As an extension of the previous example, Example 3 checks the
hasChildNodesmethod to determine whether the current node contains any children and if so, replaces them with a newly created element,elemD.Finally, to demonstrate the use of the
cloneNodemethod, Listing 2 opens a document, and clones the document element. Basically, thecloneNodemethod copies the current node and returns the copy. The method takes a single Boolean value as a parameter. If the value is true, the method clones the entire subtree; otherwise it returns the current Node object. Cloning an element copies all attributes and their values, but the text is not copied because it's stored as a child text node. Then the program adds a new element (so that you'll be able to verify that document was modified), and saves the result as test4.xml.The NamedNodeMap Interface
A
NamedNodeMapobject is similar to aNodeListobject, except that you can reference its members by name rather than using a value indexed against an ordered collection. In this sense, it acts more like an associative array, or astructstatement in the C programming language. TheNamedNodeMapobject has just one attribute,length, that references the number of nodes in the list. This interface also supports four methods:getNamedItem,setNamedItem,removeNamedItem, anditem. ThegetNamedItemmethod takes a string identifying a node name and returns that node. ThesetNamedItemmethod takes the name of a node and adds it to the node collection. If the node name already exists, it's replaced, thus overwriting the node. In this case, the replaced node is returned. Otherwise, a null value is returned. Also, if the node is an attribute that already exists, an exception is raised. Likewise,removeNamedItemtakes the name of a node and removes it from the collection. If successful, the node is returned. Finally,itemtakes an index value and returns the node at that position. To test these methods and see how you can use them, see Listing 3.Conclusion
There's a lot more to the DOM than I can cover in one article. There are the extended interfaces that let you work directly with elements, attributes, character data, entities, processing instructions, and other items. And of course, DOM Level 2 is on the horizon. DOM Level 2 introduces new models to handle events, views, style sheets, CSS, and tree traversal. As tools make their debut, I'll explore each of these topics in detail. In the meantime, you should practice loading documents into the DOM, walking through nodes, and creating, modifying, cloning, and deleting them. The more familiar you are with manipulations now, the more you'll be able to accomplish when advanced tools become available later.
(Get the source code for this article here.)
Michael is the author of Building Web Sites with XML from Prentice Hall, and architect of the Rocket XML framework. He's also the publisher of LifestylesSantaCruz.com and carries the honorary title of editor at large at Web Techniques. He can be reached at mfloyd@lifestylesSantaCruz.com.
|
|