Building Xhsc Specifications

This section explains the features that are supported in Xhsc specifications, and how you can employ these features using the two main HSC tags. Note that the content of HSC specification is the same, whether you use Xhsc as an application or as a package in another Java application.

Heads and Groups

An Xhsc specification can contain main two kinds of instructions: head instructions and group instructions. Each kind of instruction designates a kind of tag in the input document where you want to create another layer of hierarchy. Every instruction has a level or priority value associated with it. An instruction will a lower value is meant to enclose or dominate instructions with higher values.

A group instruction specificies elements of the input document that should be grouped together under a parent element. The example below shows a typical group transformation we might want to make: grouping several recipient elements under a recipientInfo element.

Example 4. Simple Group Transformation


INPUT:
    <invoicename>Invoice 1</invoicename>
    <recipient>Bob Smith</recipient><br/>
    <recipient>Customer #<custno>155555</custno></recipient>
    <recipient><phone>555-1212</phone></recipient>
    <itemname>Teddy Bear</itemname>

OUTPUT:
    <invoicename>Invoice 1</invoicename>
    <recipientInfo>
      <recipient>Bob Smith</recipient><br/>
      <recipient>Customer #<custno>155555</custno></recipient>
      <recipient><phone>555-1212</phone></recipient>
    </recipientInfo>
    <item>Teddy Bear</item>

This kind of transformation is very easy to specify as an Xhsc instruction. We would use a group instruction similar to the one below.

Example 5. Simple Group Instruction


<hsc:group name="recipient" level="20">
   <recipientInfo></recipientInfo>
</hsc:group>

It is important to set the level attribute correctly on all your Xhsc instructions. The group instruction includes intervening tags in the group as long as they have the same or higher level values. For the example above, we must assume that the itemname tag was designated by some other instruction in the Xhsc specification as having a lower level than the recipient tag. Don't worry about the "hsc:" prefix on the tag, that is covered shortly, below.

The other kind of Xhsc instruction is a head instruction. Use a head instruction to identify a tag in the input document that is the beginning of a sequence of elements that should be collected together under a new parent element. The example below shows a typical head transformation we might want to make: grouping a invoicename tag and everything that follows it within a new invoice tag.

Example 6. Simple Head Transformation


INPUT:
    <invoicename>Invoice 1</invoicename>
    <itemname>Teddy Bear</itemname>
    <invoicename>Invoice 2</invoicename>
    <itemname>Yo-Yo</itemname>
    <itemname>Squeaky Duck</itemname>
    
OUTPUT:
    <invoice>
      <invoicename>Invoice 1</invoicename>
      <itemname>Teddy Bear</itemname>
    </invoice>
    <invoice>
      <invoicename>Invoice 2</invoicename>
      <itemname>Yo-Yo</itemname>
      <itemname>Squeaky Duck</itemname>
    </invoice>

This kind of transformation is also very easy to specify as an instruction, as shown in the example below.

Example 7. Simple Head Instruction


<hsc:head name="invoicename" level="2">
   <invoice></invoice>
</hsc:head>

Again, the level value associated with an instruction is critical. The head instruction includes following elements in the group as long as they have a higher level value. This makes it easy to express relationships that you would like to model in your output document: elements that should be deeper in the tree should have higher values for their level attributes.

How Xhsc Works with Plain Text

In text input mode (-t option), Xhsc treats the plain text input as if it were an XML information set. This internally synthesized information set consists of a root element with a lot of individual character data children each containing a single line of text. You can specify the root element using an Xhsc root instruction, or you can let Xhsc create a root element for you, with the name "root".

Processing a plain text file to generate XML works a lot like processing XML data, except that the contents of the text lines indicates the structure instead of element tags. [Need an example here!]

XML Syntax for Xhsc Specifications

This sub-section describes the mechanics of expressing Xhsc instructions in an XML file. Xhsc is pretty lax about its XML document structure, but it does have some requirements that you must meet.

The instructions of an Xhsc specification may be in their own document, or interspersed through some other kind of document (typically an XSLT stylesheet). When an Xhsc specification is the entire XML document, it is customary to make the root element of the document the tag hsc. The example below shows a simple Xhsc specification as a standalone XML document.

Example 8. A Small Complete Xhsc Specification


<?xml version="1.0"?>
<hsc:hsc xmlns:hsc="urn:nz-xml-hsc/version1">
  <hsc:head name="invoicename" level="2">
    <invoice></invoice>
  </hsc:head>
  <hsc:group name="recipient" level="20">
    <recipientInfo></recipientInfo>
  </hsc:group>
  <hsc:head name="itemname" level="6">
    <item>
      <itemno>0</itemno>
    </item>
  </hsc:head>
</hsc:hsc>

Xhsc Instruction Tags and Their Attributes

This section describes all the attributes that may be supplied with Xhsc head and group instruction tags. Some of the attributes identify input elements to be processed, and the others affect the processing that is carried out on them. Note that at most one head and one group instruction can match any single element of the input document (if there are multiple possible matches, then Xhsc always uses the one which appeared first in the specification).

The attributes below help identify input document elements to be processed.

The following instruction attributes affect how an input element is processed once a matching instruction has been found.

The descriptions and semantics of the various instruction attributes can be a little complex. Some examples may help to clarify how some of the attributes can be used.

Hierarchy and the requires Attribute

If you are using Xhsc are part of the processing to generate XML that complies with a particular DTD or schema, there may be explicit rules about how elements may be nested. For example, your invoice DTD might specify that an itemname element can only appear inside an invoiceitem element, which can only appear inside an invoice element. However, the input you receive may not always supply all the right instruction matches to build the entire hierarchy correctly. The requires attribute on an Xhsc instruction can help ensure that your output document contains all the right nested elements. Here is a small example that shows how the requires attribute works.

Example 9. Using the requires Attribute


INPUT:
   <?xml version="1.0"?>
   <html>
   <body>
     <itemname>Teddy Bear</itemname>
     <itemname>Yo-Yo</itemname>
   </body>
   </html>

Xhsc SPECIFICATION:
   <?xml version="1.0"?>
   <hsc:hsc xmlns:hsc="urn:nz-xml-hsc/version1">
     <hsc:head id="inv" name="invoicename" level="2">
       <invoice></invoice>
     </hsc:head>
     <hsc:group name="itemname" level="6" requires="inv">
       <item></item>
     </hsc:head>
   </hsc:hsc>

OUTPUT:
   <?xml version="1.0" encoding="ASCII"?>
   <html>
   <body>
    <invoice>
      <item><itemname>Teddy Bear</itemname>
      <itemname>Yo-Yo</itemname>
      </item>
    </invoice>
   </body>
   </html>

More About the class Attributes

Xhsc instructions match the elements of your input document primarily by the name. However, you can use the Xhsc instruction attributes described in this subsection to tighten the matching a little bit, by distinguishing the value of exactly one attribute of the input element. (Xhsc cannot perform the much more sophisticated matching of XSLT, because it does not support XPath expressions.)

To restrict an Xhsc instruction to be applied only to input elements with a particular value for an attribute, you must include the classattrname and class attributes on the Xhsc instruction. The classattrname attribute tells Xhsc which input element attribute to consider, and the class attribute tells it the value to be matched. This feature of Xhsc is intended primarily for matching CSS-styled XHTML, where the role of a paragraph tag p can be very different, depending on the value of its class attribute.

If your Java installation includes the java.util.regex regular expression package, then the class attribute that you supply in the Xhsc instruction will be taken as a POSIX regular expression, and anchored matching will be used to test whether the attribute value from the input document element matches the instruction. Otherwise, case-insensitive string matching will be used.