Searching XML Databases

By Michael Floyd

When teaching XML, I often explain that XSL can be used to perform many of the same functions as the DOM. The basis for this comment is that both the DOM and XSL (or more specifically, the combination of XSLT and XPath), provide mechanisms for walking document trees, creating new elements, attributes, and entities, and letting you modify them in various ways. Thus, a large amount of what you can do in XSL you can do using the DOM, and vice versa.

My students often ask whether they can altogether forgo either XSL or the DOM. While the short answer often is yes, a quiet voice inside my head always says, "use both." There's no reason why your decision to use one should exclude the other. I often combine DOM and XSL to get the most out of their document manipulation and presentation features.

This is particularly true when it comes to databases. When you have XML documents that represent the result sets of database queries, you can combine DOM and XSL techniques to search them, filter the result sets, perform refinements, and present the results.

What Data, Where?

As the old adage goes, what you see is what you get. In the case of database queries, the client application may see XML, HTML, or even raw data. What you get depends both upon the database format and upon how data is processed on the server.

In a low tech solution, the data could be stored on the server in a flat XML file format. If the client is capable of handling XML data, the server simply redirects the client to the file. In this case, the client must provide all of the database features including the ability to search the database, and to filter and display records. In this case, there's little or no server-side processing.

The other end of the spectrum is when the data is stored in a proprietary database and the server processes queries and returns an XML-ized data stream. In this situation, the client application makes a query, most likely a SQL query, and receives in return an XML stream representing the result set. Microsoft SQL Server 2000, for example, lets you query the database directly through a URL. By using a FOR XML clause in your SQL query, you can have the resulting record set back as an XML string. Another, more typical method is to invoke a server-side script that receives the query, posts it to a back-end database, converts the result set to XML, and returns it to the client.

The middle of the road is similar to the previous scenario: The client application invokes a server-side script that receives the query, posts it to a back-end database, and converts the result set to XML. Rather than returning XML to the client, however, the server script can perform the XML processing on the server and ship out an XML transformation (for example, HTML) instead.

The bottom line is that you may not always have control of where your XML data is processed, especially as database vendors add the ability to output XML directly to the browser. The good news is that using XSL and the DOM will work equally well in all three situations, whether you're working with flat files, XML data that has been streamed to the browser, or server-side XML.

Filtering and Searching

To demonstrate how a browser like Internet Explorer might handle searching XML data and presenting the results, I've assembled the search page shown in Figure 1. For sample data, I've included a subset of records from the XML Tools database (see Listing 1), which is located on my Web site. The XML Tools database contains information on freely available XML tools that can be found on the Web.

When you first load the search page, it comes up with a couple of HTML forms. The Search By Category form lets you select a subset of tools based on their category. For example, to bring up a list of all the parsers in the database, select "parser" from the list. When you click on the Filter button, the table at the bottom of the screen is populated with a list of tools from that tool class.

The second form lets you enter a string and select a field type by which to search. When you click on the search button, the application performs a substring match, returning all entries that contain the search string in the field selected. For example, I can search for tools created by IBM by entering "IBM" in the form's edit box, selecting the Company radio button, and hitting the Search button. Alternatively, I can return a list of all tools with "Tool" in their title by entering that string in the edit box and selecting the Products radio button.

Setting It Up

Listing 2 presents the code for our application. This is a client-side script, so you can test it easily. However, similar code could be included in an Active Server Page to provide the same functionality using server-side processing.

Listing 2 breaks neatly into three parts: an onload script that executes when the page loads; a script that processes filtering and searching; and a <FORM> element, that accepts user input. The purpose of the onload script is to create three DOM objects—one to hold the XML Tools database document; a second to contain intermediate result sets; and a third to hold the XSLT style sheet. Next, the database document (tools.xml) and an empty result-set document (prodDB.xml; see Listing 3) are loaded into their respective DOM objects. The empty document simply assists in the creation of the result, as I'll explain later.

Similarly, a style sheet is loaded into its DOM object. Initially, the style sheet chosen (generic.xsl) simply displays all records in the database, giving the user a complete list of record entries. Once the style sheet has been selected and loaded, it's applied to the XML document using the DOM's transformNode method. Thus the DOM is often a requirement when applying style sheets to XML documents.

The third part of Listing 2 contains two HTML form elements. The first form (called category) creates the drop-down list that lets the user filter the entries displayed based on their product category. It also creates a button that lets the user submit the filtering request to the application. Thus, when the user clicks on the Filter button, the ONCLICK attribute calls the filter function from the main script, passing the category selection as a parameter.

Combination

The filter function, in the main script, takes the value, determines which category was selected, and associates a particular style sheet with that category. Then, the filter function calls the changeXSL function to apply the style sheet to the XML database document. The changeXSL function simply takes the filename of a style sheet and calls on the transformNode function to apply the style sheet to the database document.

This is where processing is handed off from the DOM to XSL. You see, the purpose of the style sheet is to filter the results and define their layout and presentation. For example, if the user selects the parser category, the parser.xsl file is selected, and the changeXSL function attaches that style sheet to the document. Parser.xsl (see Listing 4) includes an <xsl:for-each> element that iterates over all of the <product> elements in the database. Within this loop, an <xsl:if> element is used to check the <product> element's category attribute for a value of parser. If the current attribute being tested is equal to parser, then the style sheet transforms that product element. Otherwise, it's ignored, and the style sheet iterates to the next <product> element in the document.

Of course, you could do this all with just the DOM. In this case, you'd have to walk the document tree using DOM methods and include an if statement to perform the filtering. You could then use the DOM object created for the resultSet object to store a "result" tree. The downside is that you'd also have to generate the display table in code, which would be a bit more work. Moreover, you'd lose a major benefit that XSL transformations offer—namely, the ability to separate presentation from logic. For example, the DOM approach doesn't let me generate custom HTML documents based on browser type, as I've described in previous columns.

Searching for Godot

The second HTML form in Listing 2 lets the user enter a string for which to search and optionally select a field on which to search. By default, these searches are performed on the product's prodName element. Again, the user initiates the search through a button. This time, when the button is selected, the ONCLICK attribute invokes the search function within the main script. The search function takes the search string and the name of the field (product name, description, price, or company) on which to search. After ensuring that the document root is not null (which would indicate a problem loading the XML database document), the search function uses the DOM's selectNodes method to retrieve a collection of prodName elements that's assigned to the products variable.

Because products is a collection, it supports a length property, which search uses in a for loop to traverse the list. Within the for loop, the function checks the srchType (which is passed in by the form element) to determine the field on which to search. That is, srchType contains a list of values from the radio buttons in our form. So, srchType[0] represents the prodSrch radio button, srchType[1] represents the descrSrch button, and so on. By examining each radio button's checked property, we determine which radio button has been selected and use that information to select the field (or subelement) on which to search. If srchType[0].checked returns true, we know the prodName field should be searched so we assign the current item in the list of prodNames to the prod variable.

Once the search field has been assigned, an if statement uses ECMAScript's search method to determine whether the searchStr pattern is contained within the current field. If the pattern occurs in the search string, the DOM's cloneNode method copies the entire product element and all of its subelements to the cloneTree variable. This cloned tree fragment is then inserted into the resultSet's document tree.

The for loop continues iterating through the list of product elements and searching the specified subelement, inserting matching entries into the resultSet document. Once the search is completed, the generic style sheet is loaded and applied to the resultSet object using another transformNode call. Finally, search resets the resultSet object by reloading prodDB.xml. Recall that prodDB.xml contains an empty prodDB element, so this action reinitializes the resultSet for the next search.

Transitions

Moving this application to an Active Server Page should be straightforward. The HTML forms in Listing 2 have to be placed into separate .html files and the ACTION attribute of each form must point to an ASP page containing your script. Your main script has to add code that gets the query information from the forms, then calls the search and filter functions. Retrieving the form data is easy using ASP's Form object. Finally, the code contained in the onload script has to be placed in the main portion of the ASP script. Recall that this code creates and loads your DOM objects. Because the code used to create these differs on the server side, you'll have to replace the call to new ActiveXObject() with ASP's Server.CreateObject function. The result can be passed to the variables as written, and the rest of the program should run as before.

Ultimately, you can use the DOM and scripting to search, filter, and present your data. However, I think you'll find that you can build more robust applications with less work when you combine your DOM scripts with XSL programming.

(Get the source code for this article here.)


Michael is the author of Building Web Sites with XML from Prentice Hall, and architect of the Rocket XML framework. He is also the publisher of LifestylesSantaCruz.com and carries the honorary title of editor at large at Web Techniques. He can be reached at mfloyd@lifestylesSantaCruz.com.




Copyright © 2003 CMP Media LLC