A Simple Example

This section gives a quick example of using Xhsc on an XML document, in order to convey how it works and set the stage for the detailed descriptions to follow.

In the example, the problem we are trying to solve is to take flat XHTML that describes a part of an invoice, and turn it into structured XML that more closely models the semantics of an invoice. Here is a sample of our input data.

Example 1. Sample Input


<?xml version="1.0"?>
<html>
<head><title>Invoices for Dec 24</title></head>
<body>
<h1>Invoice 1</h1>
<p class="recipient">Bob Smith</p>
<p class="recipient">123 Main Street</p>
<p class="recipient">Annapolis, MD</p>
<h2>item 1</h2>
<p>Teddy Bear</p><p>$14.99</p>
<h2>item 2</h2>
<p>Yo-Yo</p><p>$5.98</p>
<h1>Invoice 2</h1>
<p class="recipient">Jane Doe</p>
<p class="recipient">#18798661</p>
<h2>item 1</h2>
<p>Tech Support Call</p><p>$55.00</p>
</body></html>

As you can see, the information set of the input does not reflect the semantic structure of an invoice. In order to get closer to the structure of an invoice, we would like to do the following things to the information:

Here is an Xhsc specification that will allow us to apply the three changes listed above. It is quite simple and short.

Example 2. Invoice HSC Specification


<?xml version="1.0"?>
<hsc:hsc xmlns:hsc="urn:nz-xml-hsc/version1">
  <hsc:head name="h1" level="1">
    <invoice></invoice>
  </hsc:head>
  <hsc:head name="h2" level="2" requires="h1">
    <invoiceitem></invoiceitem>
  </hsc:head>
  <hsc:group name="p" class="recipient" setclass="-" level="4">
    <destination></destination>
  </hsc:group>
</hsc:hsc>

Don't worry about what all the attributes mean, the important one is the name= attribute. That tells Xhsc what tags to look for in the input document. When we run Xhsc on the sample input, it produces the following output. (The command was java -jar hsc.jar sample1.xhtml invoice.hsc sample1out.xml.)

Example 3. Sample Xhsc Output


<?xml version="1.0" encoding="ASCII"?>
<html>
<head><title>Invoices for Dec 24</title></head>
<body>

   <invoice><h1>Invoice 1</h1>
    <destination><p>Bob Smith</p>
<p>123 Main Street</p>
<p>Annapolis, MD</p>
</destination>
    <invoiceitem><h2>item 1</h2>
<p>Teddy Bear</p><p>$14.99</p>
</invoiceitem>
    <invoiceitem><h2>item 2</h2>
<p>Yo-Yo</p><p>$5.98</p>
</invoiceitem>
  </invoice>
  
   <invoice><h1>Invoice 2</h1>
    <destination><p>Jane Doe</p>
<p>#18798661</p>
</destination>
    <invoiceitem><h2>item 1</h2>
<p>Tech Support Call</p><p>$55.00</p>
</invoiceitem>
  </invoice>
  </body></html>

If you are good at XSLT programming, you might want to try to craft an XSLT stylesheet that performs the same task as the Xhsc specification - it isn't as easy as it looks.