Using XML as a Language Interface for AI Applications

Said Tabet, Prabhakar Bhogaraju, and David Ash,,

MindBox Inc., 300 Drake’s Landing Suite. 155,

Greenbrae, CA 94904, USA

Abstract. One of the key advantages of XML is that it allows developers, through the use of DTD files, to design their own languages for solving different problems.  At the same time, one of the biggest challenges to using rule-based AI solutions is that it forces the developer to cast the problem within particular, AI-specific, languages which are awkward to interface with the rest of the system.  We show in this paper how XML changes all that by allowing the development of particular languages suited to particular AI problems and allows a seamless interface with the rule engine.  We show that the input and output, and even the rules themselves, from an AI application can be given as XML files allowing the software engineer to avoid having to invest considerable time and effort in building complex conversion procedures.  As the problem to be solved changes, the developer can change the language used to solve the problem and the interface is updated automatically.  We illustrate our ideas with an example drawn from the mortgage industry, showing how an AI application is able to directly underwrite a loan given an XML file as input and produces an XML file as output.

1.      Introduction

XML (eXtensible Markup Language) is a metalanguage for representing structured data on the Web (World Wide Web Consortium,  It is a metalanguage in the sense that it includes a Document Type Declaration (DTD) that is used to declare a specific language for solving particular problems (,  A DTD is a document, either contained in a separate file or embedded in the XML, which allows one to define various markup languages.

As a metalanguage, XML can be used to define a variety of different markup languages (MLs).  Examples of the markup languages that XML has been used to define include Synchronized Multimedia Integration Language (SMIL), Personal Information Description Language (PIDL), eXtensible Forms Description Language (XFDL), and many others (World Wide Web Consortium 2,

Because artificial intelligence has traditionally required specialized languages in order to solve problems (e.g. ART*Enterprise, ART-IM, ART, TIRS, Lisp, Prolog, BB1), the availability of a metalanguage suitable for defining multiple specialized languages for solving problems should allow for considerable application in the AI field.  Indeed, there have been some applications of XML to AI problems. Hayes and Cunningham proposed Case Based Markup Language (CBML) (Hayes and Cunningham, 1998), an XML application for data represented as cases. CBML was proposed to facilitate knowledge and data markup that could be readily reusable by intelligent agents. .  Limitations of the CBML approach are discussed by Hayes and Cunningham (1999) in their work on presenting a case view, making a case for techniques that can integrate easily with existing mark-up structures. 

Another effort is the Artificial Intelligence Markup Language (AIML) (The XML Cover Pages 2,  This language is an XML-based language used in ALICE, a chat-bot.  This proposed markup language offers a simple yet specialized open-source representation alternative for conversational agents.  The language offers a minimalist DTD and leverages the use of specific XML tags like patterns and categories.  Still another example is DMML (Kambhatla, 2000) which is a markup language designed for intelligent agent communication and applied to online stock trading.

In both these instances, the idea is that chat bot or other intelligent agent implementations across domains/implementations can share the same generic structure, thus making it easier to program such entities.  However, such an approach restricts the composition of an XML message to a specific set of tags and attributes. This compromises on the generic appeal of XML and quickly forms the basis for highly specialized variations of the markup language, leading us to the initial problem, that AI applications require specialized and often awkward representation schemes for data input and output. 

In this paper we show that XML of itself is an appropriate tool for building AI applications.  The onus is on the application architecture in leveraging the strengths of the XML technology to solve a problem using AI measures.  Hayes and Cunningham (1999) demonstrated this in their work on CBR applications that use standard XML documents and generate a usable XML view of a company’s knowledge system.  In our case, we will be using XML to help build rule-based applications for deployment on the Web, especially in mortgage-related domains.

2.      Problem Description

Successful e-business solutions require support for internet standards including HTML and HTTP, integration with web application servers, XML support, a robust communications infrastructure, and scalability for web-based demand (Gold-Bernstein, 1999).  Currently, the World Wide Web contains millions of html documents that make a massive repository of data. However, it is difficult for an e-business solution to take advantage of that source because of the general chaos that pervades the Web.  There is a need in all businesses for quality customer service, and e-businesses are no exception.  To provide quality customer service, intelligence is required, and hence AI (artificial intelligence) must be built into such systems.

XML is an appropriate way for representing the semi-structured data that is present on the Internet.  Expressing semantics in XML syntax rather than in first-order logic leads to a simpler evaluation function while needing no agreement on the associated ontologies (Glushko, 1999).  XML allows for some structure while being a “meta-language” which permits different structures to be used for different problems and for data to presented in different forms within a single domain (for example, an XML document is much less structured than a table in a relational database).  Thus we see XML as a technology whose influence will increase, and hence to realize the goal of intelligent customer service, a robust interface between XML and rule-based systems must be built.  We have built just such an interface and the goal of the remainder of this paper is to describe this effort.

3.      XML and ART*EnterpriseÒ

Towards exploring the usage of XML for rule-based application development, we have researched one commercially available Rule-Based application development product and developed a prototype underwriting application using XML as the choice representation for input and output data.  Our choice of software was made based on the availability of the product and its widespread usage in the mortgage industry.

ART*EnterpriseÒ (A*E), a product from MindBox[1] Inc., is an integrated knowledge-based application development environment that supports rule-based, case-based, object-oriented and procedural representation and reasoning of domain knowledge. A*E offers cross-platform support for most operating systems, windowing systems, and hardware platforms (Watson, 1997).  The product allows seamless integration with industry standard programming languages like C/C++ and offers CORBA and Web features that allow the rule engine to communicate with components written in Java or any other language.

3.1 High level architecture


A typical component-based architecture for e-commerce application is usually composed of three tiers.  The thin client layer is represented by the user interface implemented using dynamically generated HTML. The user interface runs within popular web browsers (Netscape Navigator, MS Internet Explorer, etc) embedding a Java virtual machine. The middle tier includes the web server, the application server with a servlet engine, the A*E rules engine server and the A*E-XML parser.  A database back-end forms the final layer in this architecture.  Figure 1 depicts this architecture.

The application server uses Enterprise Java Beans (EJB) to seamlessly communicate with the back-end process, the XML APIs and other protocols (such as CORBA IIOP). In our design, the server listens for user http requests and delegates A*E/XML requests to the specialized application servlet. The servlet processes the HTML request and passes the results on to the A*E-XML parser.  This component is an implementation of the Document Object Model (DOM) parser based on the XML 1.0 specification, together with the A*E rule engine.


Figure 1. High Level Architecture

3.2 The XML Parser

There are a variety of XML parsers available free of charge on the Internet.  For this application, we elected to build our own parser rather than use one of the ones that are already available.  To understand our reason for doing so, we should first explain what we hoped to accomplish with the XML parser.  The idea was to take an XML file and convert it into A*E objects.  We wished, however, to first produce Java objects as an intermediate step towards producing the A*E objects. The reason for this was to help ensure compatibility with applications and environments in which Java is a predominant technology.  Producing Java objects as a first step would make the XML file available to any Java classes which might exist.  It would also allow preprocessing of the XML files within Java.  If, for example, some form of semantic validation on the XML file is needed, this could be done in Java.

Once the Java objects are produced, and validated as needed, the next step is to invoke a method on the top level object to emit the A*E code.  As a result, we chose to build our own parser in order that the Java objects be structured in such a way as to easily permit building methods to emit A*E code.  However, if a clear standard XML-to-Java parser were to emerge, which readily permitted adding methods to the generated classes so as to allow the emitting of A*E code, that would certainly be a suitable alternative to using our own parser.

The XML-to-Java parser is implemented using JavaCC.  This compiler-compiler technology readily permits the building of parsers, which produce Java objects as their output.   The compiler produced using JavaCC is itself a Java class and can be invoked from anywhere within the Java virtual machine, for example from a servlet or a JSP page.  The compiler also checks the validity of the XML against the DTD and produces a tree of Java objects.  Once the Java objects have been created, a method is invoked on the top-level document object which searches through the tree of Java objects in the document and generates a text file that contains the appropriate A*E code.  This code is then available to be loaded into the A*E application.

4. Example

To demonstrate the feasibility of the proposed architecture, we developed an XML-enabled underwriting application developed using A*E.  The underwriting process is simple: loan eligibility is determined based on the front and back ratios. Data input and output are in XML. Figure 2 shows a schematic of the prototype.

Figure 2.  Application Schematic

4.1 Input

The input form is an HTML user interface accessible using a web browser. The form outlay is simple, with a few key input fields for harvesting user input for a loan application.  The form data is submitted to a web browser via the simple yet robust HTTP protocol. The data is then processed by a servlet residing on the web server.

The parser receives the XML document from the servlet, parses the contents and creates A*E objects.   Listing 1.0 shows input and corresponding application objects.


Example XML code input to the Parser:





Example A*E code generated by the parser for the XML input above:

(define-instance xml:Attribute9-19991215222255155 xml:Attribute

(xml:Has-Name “DATA_ID”)

(xml:Has-AttValue xml:AttValue11-19991215222255185)

(xml:ownerDocument xml:Document1-19991215222254234)

(xml:value “1001”)


(define-instance xml:Attribute12-19991215222255245 xml:Attribute

(xml:Has-Name “LOAN_AMOUNT”)

(xml:Has-AttValue xml:AttValue14-19991215222255265)

(xml:ownerDocument xml:Document1-19991215222254234)

(xml:value “100000.0”)


Listing 1.0: XML code input to the parser


Note that timestamps are appended to A*E object names to guarantee uniqueness even if multiple generated files are loaded into the same A*E image.

4.2 The Rule Engine

The application loads the data from the parser, made available as schemas in Art*Script. The rule engine first computes the mortgage payment (principal and interest) based on the loan amount, the interest rate and the loan term.  The rule engine then determines the eligibility of the case based on two ratios: (i) a ratio of monthly-housing-expenses to monthly income and (ii) a ratio of total-monthly-expenses (housing + other debts/commitments) to monthly income. A simple threshold criteria is applied for determining the eligibility of a loan, the front ratio should be no more than 28% and the back ratio less than 36%.   Upon completion of processing, the application would come up with a recommendation and an XML output object data set.  This recommendation and the computed ratios are part of the output object data from the application. 

The result, an XML document, is now available either for display or for further processing.  Listing 2.0 shows the output from the application.

<?xml version="1.0"?>





<FILE_ID_SECTION DATA_ID="1001" FILE_TYPE="XML version 1.0" DATE="Fri Jan 14 13:02:42 2000" AUTHOR="BRIGHTWARE">









Listing 2.0: XML document output from the A*E application


For the moment, in order to display the output results in a client browser, we use the same application servlet to convert the XML into HTML format.

5. Conclusion

XML is an appropriate way for representing the semi-structured data that is present on the Internet. One important consideration, is the need for standardization of parsers in the AI-XML world. Standardization would allow us to focus on solving problems, rather than selecting, for example, one of a set of parsers and investing considerable time in a debate over which parser to use.

A similar need for standardization lies in designing specialized markup languages based on XML.  Although XML allows for extension by designing other forms of markup, it is not especially desirable if a separate language is developed every time XML is used. However, sizeable vertical industry segments, like the mortgage industry, would benefit from a specialized markup of terms relevant to the mortgage domain.

Usage of XML, however, does not completely eliminate the need for building domain-specific application input layers.  In our example, the parser generates A*E objects; however, the format of these objects is pretty standard across all domains.  If we have specific information about a given domain, then that information can be used to design A*E rules, which are able to massage the objects into a format appropriate for the domain.  Alternatively, additional methods could be written on the parser’s generated Java objects that generate more domain-specific objects.  Note that none of this is absolutely necessary: rules can be written using the objects as is, but such an enhancement would make it easier to write rules for a specific domain.

6. Future Directions

Organizations implementing e-commerce applications are rapidly adopting XML as their de facto standard for data transfer between applications and partners. Dedicated industry groups like Mortgage Industry Standards Organization ( have begun standardizing the XML transaction architecture for their clientele.  Key players in the technology sector like IBM (IBM, and Sun  (Sun Microsystems, are focussing on implementing/integrating rule engines as part of their enterprise e-commerce architecture. These efforts underscore both the need for the XML-rule-based interface described in this paper, as well as the need for a domain specific application input layer.

For knowledge-based systems to be practical and offer effective solutions to the industry, it is imperative that knowledge engineers and the products themselves be extensible and take advantage of the features uniquely brought forward by XML.

7. References

1.    Flynn, P.,, University College, Cork, Internet Web Site,

2.      Glushko, R., Tenenbaum, J., and Meltzer, B., An XML Framework for Agent-Based E-commerce, Communications of the ACM, 42:3, March, 1999.

3.      Gold-Bernstein, B. 1999. From EAI to e-AI. Application Development Trends, v6 n12.

4.      Hayes, C., Cunningham, P., Distributed CBR using XML, Proceedings of the workshop: Intelligent systems and Electronic Commerce, Bremen, 1998.

5.      Hayes, C., Cunningham, P., Shaping a CBR View with XML, Technical Report TCD-CS-1999-23, Trinity College, Dublin, 1999.

6.      Kambhatla, N., Budzikowska, M., Levesque, S., Nicolov, N., Zadrozny, W., Wicha, C., and MacNaught, J., DMML:  An XML Language for Interacting with Multi-Modal Dialog Systems, Proceedings of the Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, Texas, August, 2000.

7.      Watson, Ian. Applying Case-Based Reasoning: Techniques for Enterprise Systems. Morgan Kaufmann Publishers inc., 1997.

8.      IBM, Internet Web Site,

9.      Mortgage Industry Standards Organization, Internet Web Site,

10.  Sun Microsystems, Internet Web Site,

11.  World Wide Web Consortium, Internet Web Site,

12.  World Wide Web Consortium 2, Internet Web Site,

13., Internet Web Site,

14.  The XML Cover Pages, Internet Web Site,

15.  The XML Cover Pages 2, Internet Web Site,



[1] Formerly a part of Inference and Brightware corporations.