Patterns in XSL

By Michael Floyd

Last month I showed how you can assemble a workbench of XML tools that together will let you serve XML documents from your Web site. One of the tools on that workbench, XML Enabler, is a Java servlet that lets you map an XSL style sheet to a specific browser. This, in turn, lets you attach a style sheet to an XML document that's capable of generating HTML specific to that browser's capabilities. Imagine being able to position a DIV block and have it look the same in Microsoft Internet Explorer 5 and in Netscape Navigator 4, or to be able to render an XML document in any browser, including Lynx. The strength in delivering such documents, as you'll see, comes from the eXtensible Style Language (XSL).

There's a lot to XSL, so this month, I'd like to show how you can use patterns to locate objects within the document tree. From there, you'll be able to specify template rules that let you format these objects. Norman Walsh provides a good introduction to XSL in his article "XSL: The Extensible Style Language" (see Web Techniques, January 1999). Since I'll focus on XSL patterns, you may want to refer to this article for an overview of XSL. (Note that there have been minor changes in the syntax since this story appeared. In particular, the <xsl:process-children> element, which processes an XSL template, has been renamed <xsl:apply-templates>. Also, the article refers to the pattern attribute as the means for specifying XSL patterns. This attribute's name has been changed to match.)

XSL Processing

To quickly review, an XML processor takes a marked-up document and produces a tree-like structure containing the elements, attributes, entities, and so on, of your document. At this point, you can access the "objects" in this tree through any Document Object Model (DOM) API. You can also invoke an XSL processor to apply formatting to these objects and output them in just about any manner you can think of. The XSL processor takes the tree generated in XML, called the "source tree," and creates a new "result tree" that includes all of the objects to be output along with pertinent formatting information.

As a style-sheet developer, you can control this process through template rules. You define a template rule in a style sheet using the <xsl:template> element. Your rule generally consists of two parts: a pattern that's used to match with elements, attributes, and other nodes in the source tree; and the template that generates part of the result tree. For example, the template rule in Example 1 looks for paragraph elements in the source tree. When the processor finds such an element, the formatting portion of the template rule is applied to the paragraph content.

Your template rules are placed inside an <xsl:stylesheet> element, as shown in Example 2. The style-sheet element has several optional attributes, which are listed in Table 1. However, Example 2 simply defines the namespace, xmlns:xsl, which is a required step. Note that the namespace must point to the URI shown in the example. Next, a namespace for the result object is defined. Example 2 creates a namespace, fo, for formatting objects and assigns it to the result namespace.

The fact is, there are many unresolved questions related to the implementation of formatting objects in a device-independent manner. So, the formatting-objects portion of the XSL proposal is still in limbo. Asked whether Microsoft supported them, the company's XML evangelist, Adam Bosworth, responded that Microsoft does not support the formatting-objects portion of the W3C proposal, nor does it have plans to support them in the future. Marie Wieck, director of IBM's technology network computing software division, concedes that that there are still ambiguities in the proposal and more work still needs to be done. However, IBM will support the XSL standard when it solidifies, assuming customer demand warrants it.

Fortunately, the URI in Example 2 can point at things other than fo. For example, later in this column I'll transform an XML document and output HTML. The first step in setting up such a transformation is defining the following namespace:

xmlns="http://www.w3.org/TR/REC-html40"

result-ns=""

In this case, the result-ns is optional and is included for illustration purposes. As you develop your style sheets, keep in mind that XSL-defined elements are recognized only in the style sheet, not in the source document. Of course, this is just the nickel tour of XSL processing, and there's a great deal missing from this discussion. However, it should give you a basis from which to explore XSL patterns.

Patterns

Returning to our source tree, it would be nice to be able to locate any node and then apply specific formatting to that node. That's where patterns come in. You use a pattern to select a node or set of nodes in the source document. In this way, you can control how the XSL processor processes your document. The syntax for creating patterns is straightforward and resembles the paths used in directory structures. Therefore, it's helpful to remember that the patterns you specify are always in relation to your current position in the tree. The simplest pattern is an element type. For example, a pattern <xsl:template match = "chapter"> matches any child element that's a chapter

There are a number of operators that let you control how to search for patterns within the tree. As I mentioned, pattern syntax resembles the syntax used for traversing directory structures. For instance, a period represents the current node in the tree, just as it would represent the current directory in a directory structure. Likewise, two periods (..) refer to the parent of the current node. The slash character (/) lets you select specific descendant patterns. For example, "chapter/title/paragraph" would start at the current node, look for a chapter child, then a title, and ultimately match with any paragraph descendants. Again, this is very much like using directory paths, so it should feel intuitive.

You can also use the wildcard character (*) to match all elements. For example, */subhead selects all subhead grandchildren of the current node. On the other hand, chapter/* matches any element that has a chapter parent. Another operator, //, matches descendants instead of children. For example, chapter//subhead matches all subhead elements with a chapter ancestor.

You can create alternative paths through the tree using the or operator (|). For instance, you could select either a chapter or an appendix using chapter|appendix. You can also string longer patterns together. As you construct your patterns, however, keep in mind that / binds more tightly than |. For example, */chapter/title | ../preface/title would select either the title chapter/title grandchild of the current, or look to the parent of the current node for a preface/title descendant. You have noticed that I've added some white space between the selectors on either side of the or operator. White space is not significant, so you can break things as you like for better readability.

Other Node Types

So far, I've described the syntax for accessing element nodes within the source tree. But a node can contain other objects as well. To distinguish these other objects, you'll have to identify them for the XSL processor. For example, to identify an object as an attribute, you must prefix the attribute name with the @ symbol. The pattern syntax is pretty much the same, though. The figure/@caption pattern selects the caption attribute of the figure element, which is a child of the current node. The @* pattern selects all attributes.

You can select comments in the source tree using the comment pattern. Using the comment() pattern without any arguments selects all comment nodes. Similarly, the pi() pattern matches all processing- instruction child nodes. In addition, you can specify an argument that indicates a target for the processing instruction, such as pi("xml-stylesheet").

Tests, Comparisons, And Refinement

You can refine the result returned by a pattern by specifying the parameter within square brackets ([ ]) after the pattern. For example, list[@type] matches list elements with a type attribute, book[editor] matches child book elements that have at least one editor child element.

Another thing XSL lets you do is compare patterns to strings. For example, list[@type="ordered"] matches type attributes with a value of ordered, and figure[@caption="Figure1"] looks for "Figure1" captions. Finally, contact [name="Joe Butler"], selects the child element with the value "Joe Butler".

XSL also lets you test for positions relative to a sibling. In particular, you can select the first and last child elements in a branch, as well as the first and last elements of their respective types. Table 2 lists the options.

Putting Patterns to Work

I've omitted some additional syntactical details, but I've given you the foundation to create some very powerful patterns. This will let you access and ultimately format and output virtually any object within your source tree. To demonstrate in a real-world sense, I've created an XML document and an accompanying XSL style sheet that will transform our document into HTML. The XSL style sheet shows how you can combine CSS style rules with XSL to format HTML. This is, in fact, how I anticipate most Web developers will handle XML.

Listing One presents news.xml, an XML document containing a news story that recently ran on my Web site. The root element, Story, contains the other elements for this document. I've created elements for the section in which the story runs, along with elements for the title, dek (subtitle), byline, dateline, and so on. The BodyText element contains the content for the news story, and includes additional markup to create a dropcap for the leading character in the first paragraph, and some bold and italics. Since I'm not interested in validating this document, no document type definition (DTD) is specified.

The style sheet for this document, news.xsl in Listing Two, contains the template rules to process Listing One. When a rule maps to a source element, the rule's template is instantiated. The templates may contain literal "result" elements, character data, and instructions for creating a portion of the result tree. So, after creating the namespaces for the <xsl:stylesheet> element as detailed earlier, Listing Two creates a template rule to process the root element, Story. The root node is a special case, so Listing Two uses the / pattern to get the root element. (If you need to access the document element, you can use the pattern: /*.)

Next, the template includes some "literal" HTML that will be passed directly to the result tree. Note that this includes the CSS style rules for formatting the document. I covered CSS in a previous column (see "Cascading Style Sheets: To Hell with Standards," Web Techniques, March 1999). These are essentially the same style rules used in that column, so refer back to it for details on formatting.

After the style rules, I've included an HTML <TITLE> element, which specifies a title for the document. The title comes from the SectionTitle of the XML document, so I need to process a portion of it to get this title. I'll use <xsl:apply-templates>, which processes descendant nodes. You specify the nodes to be processed by using a pattern in a select attribute, in this case, select="Story/SectionTitle". If no select attribute is included, then apply-templates would process the immediate children of the current node. Now, the "News&Views" title will appear in the title bar of the browser.

Next, Listing Two creates the Body of the HTML document, using apply-templates and a select pattern to similarly process the story title, dek (subtitle), byline, and text elements of the document. HTML SPAN and DIV elements are used to apply the CSS styles. I've included additional templates to handle specific elements within the document. There are template rules to handle paragraph formatting for the BodyText, format the dropcap, handle bold and italics, and to create a mailto anchor that's included in the author's byline.

You can test this yourself using tools from the XML workbench I described last month. In particular, you'll need to install the XML4J parser and LotusXSL (go to www.alphaWorks.ibm.com for more information). From there you can run this example from the command line. The resulting document is shown in Figure 1. Although I have not tested this example in Internet Explorer 5, it should behave similarly.

Conclusion

The key to applying templates to document elements is patterns. In fact, patterns let you access any object in your document tree. Without question, there's a lot more to template rules than I've covered here. For example, template rules let you create new elements, attributes and attribute sets, comments, and more. I'll examine all that you can do with template rules in the future.

Ultimately, though, XSL's contribution to Web developers is its tremendous ability to transform XML documents. By adding new style sheets for different output formats, you can create transformations for virtually any medium you desire. You can even generate custom transformations for specific browsers. That means you can create Web sites that push the envelope with new features without leaving behind users with older browsers. It also means you can render XML in any browser, even Lynx.

(Get the source code for this article here.)


Michael publishes BeyondHTML.com, speaks on XML in the Ken North Expert Seminars, and serves as Web Techniques' editor at large. His upcoming book for XML Web developers should be available this fall. He can be reached at mfloyd@lifestylesSantaCruz.com.




Copyright © 2003 CMP Media LLC
Read our privacy policy.

www3




Copyright © 2003 CMP Media LLC