Presenting Data Records Using XSLT Expressions

Last month, I mentioned that just as I was placing the final touches on my column, the W3C's XSL Working Group quietly released an update to the XSL Working Draft Specification. Showing just how in flux the standard is, the new XSL draft specification dumps virtually every feature (with the exception of formatting objects) into a new working draft called XSL Transformations, or simply XSLT. XSL now refers specifically to XSL formatting objects, which are to XML what Cascading Style Sheets are to HTML.

By and large, XSLT contains all of the features I've covered over the past few columns including tree processing, patterns, and templates. However, XSLT adds a plethora of new features. The pattern syntax I described in June has been expanded and a new syntax called location paths has been introduced. Possibly the most dramatic change, however, is the addition of a complete expression language, which looks much like a small programming language. This month, I'll examine the salient points of XSLT expressions and give you an idea of how you can use them. I'll also deliver on a past promise to talk about iteration and conditional processing in XSL (now XSLT). Finally, I'll show how you can take any number of database records in XML format, sort them in either ascending or descending order, and transform them to HTML for presentation.

Expressions

As you'll recall from past columns, an XML document can be broken down into a collection of objects that are ordered in a hierarchical fashion. This hierarchical representation is called a tree. (If you need a refresher, please refer to my June 1999 column in Web Techniques.) Tree structures are useful because they express relationships between your XML elements in a very simple way. To process your XML documents, including transforming XML into HTML, you need a way go through this tree and select particular elements. Once you have an element in hand, you can do all kinds of things to it in preparation for output. For example, you can add formatting to a headline style, generate text for a text element, or create an entirely new element. There may even be times when you want to process the same element in different ways, based on a set of constraints, or process a set of elements with a common structure (as with database records).

That's where expressions come in: The XSLT expression language lets you select one or more elements, specify conditions for processing nodes, and generate new elements that can be inserted into the result tree. The expression language provides some general purpose functions that let you, for example, determine the number of nodes in a tree fragment, get the position of the current node, and so on. There are functions that support Boolean operations, and functions to manipulate strings and numbers. When an expression is evaluated, you get back an object whose type is either a string, number, boolean, node-set, or a result tree fragment. The string type refers to a Unicode character string. A boolean is represented as either a true or false value. The number type represents a floating-point real number. A node-set refers to nodes in the source tree, and result tree fragment refers to elements in the result tree.

Of course, an expression can simply be a pattern, as described in my June column. In that case, the expression returns the set of nodes selected by the pattern. However, XSLT provides various functions that let you manipulate these different object types. For example, Table 1 presents a list of the proposed functions for handling strings. The basic function, string(), converts an object of another type to a string. For instance, if the object was originally a number type, string() performs a conversion and returns a string in the form of a real number. If the number is negative, a negative sign (-) precedes the string. Boolean values are converted to the strings true and false. If the object is a node-set, the first node (in document order) is selected and that value is converted to a string. An empty string is returned if the set is empty. A result tree fragment is converted to a string by treating it as a single document fragment node. In all cases, the argument defaults to the current node if the argument is omitted.

A complementary function, number(), does the opposite of string(): It takes a string that represents a numeric value and converts it to that value (see Table 2). If the string does not represent a number, then the function returns a value of 0. The input string may contain white space, and Boolean values are converted to 1 (true), or 0 (false). If the argument contains a node-set or a result tree fragment, it's converted to a string and then evaluated as just described. Finally, if you don't supply an input string in the argument, the current node is used.

Another useful function in Table 1 is concat(), which takes two strings and concatenates them. For example, let's say your application performs a database lookup and needs to add a label to one field in a record. You might use concat("Name: ", "Michael Floyd"). The result would be

Name: Michael Floyd

Another function, contains(), could be useful in searching for substrings. For example contains("ML", "BeyondHTML.com") will return with a value of true. What's unclear from the draft specification is whether case-sensitive comparisons are allowed. A case-sensitive comparison would mean, for example, that contains("ml", "BeyondHTML.com") would return false. Two related functions, substring-before and substring-after are illustrated in Example 1.

XSLT provides several other functions and operators for handling numbers. The div operator divides two numbers and returns a floating-point number as specified by the IEEE 754 specification. The quo operator also divides two numbers, but truncates the result and returns an integer. The mod operator similarly divides two numbers, but returns the remainder as an integer. For example, 10 quo 3 returns the value 3, while 10 mod 3 returns the value 1. The sum() function takes a node-set and returns the sum of the values of the nodes in the set. round() returns an integer after the value has been rounded off. The floor() function returns an integer representing the largest number not greater than the argument value. And the ceiling() function returns the smallest integer that is not less than the argument value.

Booleans

Booleans are particularly useful when comparing two values. XSLT provides five functions and five operators that let you make these comparisons; see Table 3. The boolean() function simply evaluates its argument and converts it to a Boolean. The argument can be a number, node list, result tree fragment, or a string. The next function, not() negates whatever Boolean value the argument would normally return. Thus, not() returns a value of true when its argument is false. The true() function forces a true value to be returned, and likewise, false() always returns false.

The Boolean operators listed in Table 3 directly test the values on either side of the operand. For <, >, <=, or >=, each operand is converted to a number and then the two numbers are compared. For example, 1 < 2 returns a value of true and 2 <= 1 returns false. The = operator is treated differently depending on the argument type. Number types are treated as just described for the other operands. However, if the argument is not of type number, the operands are converted to strings and the string values are compared. The or operator evaluates each operand and converts it to a Boolean, then compares the two Boolean values. The result of the operation is true if either value is true. The and operator is converted similarly. However, both operands must be true for the result to return true.

XML lets you specify the language for elements using an xml:lang attribute. The lang() function examines this value for the current node and compares it to the language specified in its argument. If the xml:lang attribute was not specified for the current node, the lang() function looks up the tree for ancestors that have specified the xml:lang attribute and uses that value. If no attribute was specified, the lookup fails and the lang() function returns false.

Extension Functions

While the expression language contains features found in a programming language, it's not intended to be one. Instead, XSLT provides an extension mechanism that lets you access languages such as JavaScript, VBScript, and Java. The specification doesn't require an XSLT processor to support extensions for any particular language, so you'll want to check the documentation for your specific processor for this support. We'll examine extension functions another time.

Putting XSLT to Work

XSLT provides a number of additional features that will make it easier to process elements. For example, XSLT provides a for-each element that instructs the processor to perform iterative processing. This is particularly useful when you need to process a large number of elements that have the same structure. A typical example is when you have a collection of elements that represent records in a database. Consider the XML document in Listing One, which represents some of the tools in the XML tools database at BeyondHTML.com. The document represents a database table called productDB. Each record is referenced as product. The rest of the elements represent field names. For simplicity, Listing One presents just three records.

The goal of this example is to publish some of the fields from each record as a summary within an HTML table. The summary could be a hit list resulting from searching the database. In any case, we would like to transform some (but not all) of the XML record elements in Listing One into HTML and publish each summary in a row of the table. The columns represent the product's name, version, and price, respectively. Let's further stipulate that we'd like to sort each entry in ascending order based on the product's name.

Listing Two presents the XSL style sheet to execute our transformation. The style sheet contains a single template rule that uses a step pattern to select the document element, productDB. Next, the template generates some preliminary HTML elements including the page TITLE, appropriate labels, and the start of the HTML table. The first row of the table contains the headings for each column.

Next, the style sheet uses a for-each element to process each product record. Without such a construct, we'd have to write a separate transformation for every record in the database. Not only would this be tedious, but we have no way of knowing in advance how many records the search will return. The XSL processor, with the help of the for-each element, will figure that out for us. Prior to doing anything else, Listing Two immediately calls <xsl:sort> to sort the product nodes in ascending order. The select attribute identifies the sort key. The sort element takes some additional attributes including order (specifies the sort order), lang (identifies the language of the sort keys), and data-type (determines the data type of the element nodes). By default, the sort order is ascending, and the data type is text, so I've left these out.

I've nested three additional <xsl:for-each> elements within the first -- one for each field we want to process. After selecting the appropriate field element, we create a column in the table and call <xsl:apply-templates> to process the element. I've thrown in a curve on the last element. Some products in the database are priced at a specific dollar value, but others are available for free. We could simply enter $0 for freeware items, but I decided to write out "freely available" in the record. The style sheet deals with this using <xsl:if> to test whether this a numeric value. If so, it adds a dollar sign in front of the value. Otherwise, it is text and the template leaves it alone.

Conclusion

There is, of course, a great deal more to XSLT, including location paths and extensions to external languages, which we'll cover in the future. However, the other part of the equation, XSL, has yet to be addressed. While we have a bright and shiny new XSL working-draft specification, there's not a processor on the planet that (at the time of this writing) supports it.

My guess is that you won't see one before the end of the year. That's because Sun and Adobe announced cash bounties totaling some $90,000 for the best implementation of XSL formatting objects. The implementation must be developed independently and the winning entry will likely be placed in the public domain. However, the finalists won't be announced until the Graphic Communications Association's XML 99 conference in December. In the meantime, we'll just have to hope that someone like IBM will address the issue.

(Get the source code for this article here.)


Michael publishes BeyondHTML.com, and serves as Web Techniques' editor at large. His upcoming book, Building Web Sites with XML, is due out from Prentice Hall later this year. He can be reached at mfloyd@lifestylesSantaCruz.com.




Copyright © 2003 CMP Media LLC