| User's Guide for Xhsc | ||
|---|---|---|
| <<< Previous | Next >>> | |
This section explains the features that are supported in Xhsc specifications, and how you can employ these features using the two main HSC tags. Note that the content of HSC specification is the same, whether you use Xhsc as an application or as a package in another Java application.
An Xhsc specification can contain main two kinds of instructions: head instructions and group instructions. Each kind of instruction designates a kind of tag in the input document where you want to create another layer of hierarchy. Every instruction has a level or priority value associated with it. An instruction will a lower value is meant to enclose or dominate instructions with higher values.
A group instruction specificies elements of the input document that should be grouped together under a parent element. The example below shows a typical group transformation we might want to make: grouping several recipient elements under a recipientInfo element.
Example 4. Simple Group Transformation
INPUT:
<invoicename>Invoice 1</invoicename>
<recipient>Bob Smith</recipient><br/>
<recipient>Customer #<custno>155555</custno></recipient>
<recipient><phone>555-1212</phone></recipient>
<itemname>Teddy Bear</itemname>
OUTPUT:
<invoicename>Invoice 1</invoicename>
<recipientInfo>
<recipient>Bob Smith</recipient><br/>
<recipient>Customer #<custno>155555</custno></recipient>
<recipient><phone>555-1212</phone></recipient>
</recipientInfo>
<item>Teddy Bear</item>
|
This kind of transformation is very easy to specify as an Xhsc instruction. We would use a group instruction similar to the one below.
Example 5. Simple Group Instruction
<hsc:group name="recipient" level="20"> <recipientInfo></recipientInfo> </hsc:group> |
It is important to set the level attribute correctly on all your Xhsc instructions. The group instruction includes intervening tags in the group as long as they have the same or higher level values. For the example above, we must assume that the itemname tag was designated by some other instruction in the Xhsc specification as having a lower level than the recipient tag. Don't worry about the "hsc:" prefix on the tag, that is covered shortly, below.
The other kind of Xhsc instruction is a head instruction. Use a head instruction to identify a tag in the input document that is the beginning of a sequence of elements that should be collected together under a new parent element. The example below shows a typical head transformation we might want to make: grouping a invoicename tag and everything that follows it within a new invoice tag.
Example 6. Simple Head Transformation
INPUT:
<invoicename>Invoice 1</invoicename>
<itemname>Teddy Bear</itemname>
<invoicename>Invoice 2</invoicename>
<itemname>Yo-Yo</itemname>
<itemname>Squeaky Duck</itemname>
OUTPUT:
<invoice>
<invoicename>Invoice 1</invoicename>
<itemname>Teddy Bear</itemname>
</invoice>
<invoice>
<invoicename>Invoice 2</invoicename>
<itemname>Yo-Yo</itemname>
<itemname>Squeaky Duck</itemname>
</invoice>
|
This kind of transformation is also very easy to specify as an instruction, as shown in the example below.
Example 7. Simple Head Instruction
<hsc:head name="invoicename" level="2"> <invoice></invoice> </hsc:head> |
Again, the level value associated with an instruction is critical. The head instruction includes following elements in the group as long as they have a higher level value. This makes it easy to express relationships that you would like to model in your output document: elements that should be deeper in the tree should have higher values for their level attributes.
In text input mode (-t option), Xhsc treats the plain text input as if it were an XML information set. This internally synthesized information set consists of a root element with a lot of individual character data children each containing a single line of text. You can specify the root element using an Xhsc root instruction, or you can let Xhsc create a root element for you, with the name "root".
Processing a plain text file to generate XML works a lot like processing XML data, except that the contents of the text lines indicates the structure instead of element tags. [Need an example here!]
This sub-section describes the mechanics of expressing Xhsc instructions in an XML file. Xhsc is pretty lax about its XML document structure, but it does have some requirements that you must meet.
Xhsc is namespace-aware. All Xhsc tags must be in the namespace urn:nz-xml-hsc/version1. (For more information about namespaces, see [W3C.XNS])
Xhsc instructions may appear anywhere in the specification XML document, but instructions may not be nested; in other words, no Xhsc group or head tag may be a child of another group or head tag.
Xhsc is non-validating. Unknown attributes are simply ignored. Unknown tags in Xhsc's namespace cause a warning message but do not prevent processing.
Each Xhsc instruction may have zero or one child elements. If it one child element, the nodes that make up the collection are appended to the list of children of that element.
Every Xhsc instruction must have the following two attributes: name and level. Xhsc instruction attributes and their values are discussed in more detail in the next section.
The instructions of an Xhsc specification may be in their own document, or interspersed through some other kind of document (typically an XSLT stylesheet). When an Xhsc specification is the entire XML document, it is customary to make the root element of the document the tag hsc. The example below shows a simple Xhsc specification as a standalone XML document.
Example 8. A Small Complete Xhsc Specification
<?xml version="1.0"?>
<hsc:hsc xmlns:hsc="urn:nz-xml-hsc/version1">
<hsc:head name="invoicename" level="2">
<invoice></invoice>
</hsc:head>
<hsc:group name="recipient" level="20">
<recipientInfo></recipientInfo>
</hsc:group>
<hsc:head name="itemname" level="6">
<item>
<itemno>0</itemno>
</item>
</hsc:head>
</hsc:hsc>
|
This section describes all the attributes that may be supplied with Xhsc head and group instruction tags. Some of the attributes identify input elements to be processed, and the others affect the processing that is carried out on them. Note that at most one head and one group instruction can match any single element of the input document (if there are multiple possible matches, then Xhsc always uses the one which appeared first in the specification).
The attributes below help identify input document elements to be processed.
name [string]
The value of this attribute is the local name of an input document element. An input document element is selected if its name matches the string. Note that matching against the input element name is case-insensitive. Either this attribute or the match attribute must appear on every group and tag element in the specification.
match [string]
The value of this attribute is a regular expression that will be matched against Text nodes in the input document. A text node will be selected if a match for the regular expression appears anywhere inside it (non-anchored matching). The matching is always case-sensitive. Either this attribute or the name attribute must appear on every group and tag element in the specification.
namespace [string, optional]
The value of this attribute is the namespace of an input document element (IMPORTANT: the value of this attribute must be the full namespace name, NOT the prefix name.) If this attribute appears, the input element is selected if its namespace matches the string value of the attribute; if this attribute does not appear then the namespace of the input element is ignored. Note that the matching is case-insensitive.
classattr [string, optional]
The value of this attribute must be the name of an attribute that will appear in elements of the input document. If this attribute does not appear, then it has an implicit value of "class".
class [string, optional]
The value of this attribute must be a POSIX regular expression. If this attribute appears, then the input element is selected if its value for the attribute named by the classattr attribute matches the regular expression value of this attribute. If the input element does not have a value for the attribute named by classattr, and this attribute appears on the instruction, then the element cannot be selected.
The following instruction attributes affect how an input element is processed once a matching instruction has been found.
level [number, mandatory]
The value of this attribute controls the priority of the selected input element in the Xhsc processing. Lower values for level are meant to enclose or dominate higher values. The value should be a non-negative floating-point number.
id [string, optional]
The value is the identifier for this instruction. If this attribute does not appear, then its value is taken to be the value of the mandatory name attribute.
requires [string, optional]
The value of this attribute is the identifier of some other instruction which this instruction needs to be its parent. During processing, if this rule is processed and its parent (immediately pending instruction) is not the designated rule, then Xhsc synthesizes an instance of the designated rule and processes it prior to processing this rule. (Note: this sounds complicated, but it is really simple in practice.)
setclass [string, optional]
This attribute should only appear when the class attribute also appears (otherwise it is ignored). During processing of this instruction, the value of the input element attribute named by the classattr attribute is set to the value of this attribute. (This facility is used to force all the attribute values that matched the regular expression to be the same.) If the value of this attribute is "-", then the input element attribute is not set, it is deleted.
The descriptions and semantics of the various instruction attributes can be a little complex. Some examples may help to clarify how some of the attributes can be used.
If you are using Xhsc are part of the processing to generate XML that complies with a particular DTD or schema, there may be explicit rules about how elements may be nested. For example, your invoice DTD might specify that an itemname element can only appear inside an invoiceitem element, which can only appear inside an invoice element. However, the input you receive may not always supply all the right instruction matches to build the entire hierarchy correctly. The requires attribute on an Xhsc instruction can help ensure that your output document contains all the right nested elements. Here is a small example that shows how the requires attribute works.
Example 9. Using the requires Attribute
INPUT:
<?xml version="1.0"?>
<html>
<body>
<itemname>Teddy Bear</itemname>
<itemname>Yo-Yo</itemname>
</body>
</html>
Xhsc SPECIFICATION:
<?xml version="1.0"?>
<hsc:hsc xmlns:hsc="urn:nz-xml-hsc/version1">
<hsc:head id="inv" name="invoicename" level="2">
<invoice></invoice>
</hsc:head>
<hsc:group name="itemname" level="6" requires="inv">
<item></item>
</hsc:head>
</hsc:hsc>
OUTPUT:
<?xml version="1.0" encoding="ASCII"?>
<html>
<body>
<invoice>
<item><itemname>Teddy Bear</itemname>
<itemname>Yo-Yo</itemname>
</item>
</invoice>
</body>
</html>
|
Xhsc instructions match the elements of your input document primarily by the name. However, you can use the Xhsc instruction attributes described in this subsection to tighten the matching a little bit, by distinguishing the value of exactly one attribute of the input element. (Xhsc cannot perform the much more sophisticated matching of XSLT, because it does not support XPath expressions.)
To restrict an Xhsc instruction to be applied only to input elements with a particular value for an attribute, you must include the classattrname and class attributes on the Xhsc instruction. The classattrname attribute tells Xhsc which input element attribute to consider, and the class attribute tells it the value to be matched. This feature of Xhsc is intended primarily for matching CSS-styled XHTML, where the role of a paragraph tag p can be very different, depending on the value of its class attribute.
If your Java installation includes the java.util.regex regular expression package, then the class attribute that you supply in the Xhsc instruction will be taken as a POSIX regular expression, and anchored matching will be used to test whether the attribute value from the input document element matches the instruction. Otherwise, case-insensitive string matching will be used.
| <<< Previous | Home | Next >>> |
| A Simple Example | Using Xhsc from the Command Line |