Using
XML as a Language Interface for AI Applications
Said.Tabet@mindbox.com, Prabhakar.Bhogaraju@mindbox.com, david.ash@mindbox.com
MindBox Inc., 300
Drake’s Landing Suite. 155,
Abstract. One
of the key advantages of XML is that it allows developers, through the use of
DTD files, to design their own languages for solving different problems. At the same time, one of the biggest
challenges to using rule-based AI solutions is that it forces the developer to
cast the problem within particular, AI-specific, languages which are awkward to
interface with the rest of the system.
We show in this paper how XML changes all that by allowing the
development of particular languages suited to particular AI problems and allows
a seamless interface with the rule engine.
We show that the input and output, and even the rules themselves, from
an AI application can be given as XML files allowing the software engineer to
avoid having to invest considerable time and effort in building complex
conversion procedures. As the problem
to be solved changes, the developer can change the language used to solve the
problem and the interface is updated automatically. We illustrate our ideas with an example drawn from the mortgage
industry, showing how an AI application is able to directly underwrite a loan
given an XML file as input and produces an XML file as output.
1.
Introduction
XML
(eXtensible Markup Language) is a metalanguage for representing structured data
on the Web (World Wide Web Consortium, http://www.w3.org/XML). It is a metalanguage in the sense that it
includes a Document Type Declaration (DTD) that is used to declare a specific
language for solving particular problems (XML.com, http://www.xml.com/pub/98/10/guide2.html). A DTD is a document, either contained in a
separate file or embedded in the XML, which allows one to define various markup
languages.
As
a metalanguage, XML can be used to define a variety of different markup
languages (MLs). Examples of the markup
languages that XML has been used to define include Synchronized Multimedia
Integration Language (SMIL), Personal Information Description Language (PIDL),
eXtensible Forms Description Language (XFDL), and many others (World Wide Web
Consortium 2, http://www.w3.org/TR).
Because artificial intelligence has
traditionally required specialized languages in order to solve problems (e.g.
ART*Enterprise, ART-IM, ART, TIRS, Lisp, Prolog, BB1), the availability of a
metalanguage suitable for defining multiple specialized languages for solving
problems should allow for considerable application in the AI field. Indeed, there have been some applications of
XML to AI problems. Hayes and Cunningham proposed Case Based Markup Language
(CBML) (Hayes and Cunningham, 1998), an XML application for data represented as
cases. CBML was proposed to facilitate knowledge and data markup that could be
readily reusable by intelligent agents. .
Limitations of the CBML approach are discussed by Hayes and Cunningham
(1999) in their work on presenting a case
view, making a case for techniques that can integrate easily with existing
mark-up structures.
Another effort is the Artificial Intelligence
Markup Language (AIML) (The XML Cover Pages 2, http://www.oasis-open.org/cover/aiml-ALICE.html).
This language is an XML-based language used in ALICE, a chat-bot. This proposed markup language offers a
simple yet specialized open-source representation alternative for
conversational agents. The language
offers a minimalist DTD and leverages the use of specific XML tags like
patterns and categories. Still another
example is DMML (Kambhatla et.al., 2000) which is a markup language designed
for intelligent agent communication and applied to online stock trading.
In both these instances, the idea is that chat bot or other intelligent agent implementations across domains/implementations can share the same generic structure, thus making it easier to program such entities. However, such an approach restricts the composition of an XML message to a specific set of tags and attributes. This compromises on the generic appeal of XML and quickly forms the basis for highly specialized variations of the markup language, leading us to the initial problem, that AI applications require specialized and often awkward representation schemes for data input and output.
In
this paper we show that XML of itself is an appropriate tool for building AI
applications. The onus is on the application
architecture in leveraging the strengths of the XML technology to solve a
problem using AI measures. Hayes and
Cunningham (1999) demonstrated this in their work on CBR applications that use
standard XML documents and generate a usable XML view of a company’s knowledge system. In our case, we will be
using XML to help build rule-based applications for deployment on the Web,
especially in mortgage-related domains.
2.
Problem
Description
Successful
e-business solutions require support for internet standards including HTML and
HTTP, integration with web application servers, XML support, a robust
communications infrastructure, and scalability for web-based demand
(Gold-Bernstein, 1999). Currently, the
World Wide Web contains millions of html documents that make a massive
repository of data. However, it is difficult for an e-business solution to take
advantage of that source because of the general chaos that pervades the
Web. There is a need in all businesses
for quality customer service, and e-businesses are no exception. To provide quality customer service,
intelligence is required, and hence AI (artificial intelligence) must be built
into such systems.
XML is an appropriate way for representing the semi-structured data that is present on the Internet. Expressing semantics in XML syntax rather than in first-order logic leads to a simpler evaluation function while needing no agreement on the associated ontologies (Glushko et.al., 1999). XML allows for some structure while being a “meta-language” which permits different structures to be used for different problems and for data to presented in different forms within a single domain (for example, an XML document is much less structured than a table in a relational database). Thus we see XML as a technology whose influence will increase, and hence to realize the goal of intelligent customer service, a robust interface between XML and rule-based systems must be built. We have built just such an interface and the goal of the remainder of this paper is to describe this effort.
3.
XML and
ART*EnterpriseÒ
Towards
exploring the usage of XML for rule-based application development, we have
researched one commercially available Rule-Based application development
product and developed a prototype underwriting application using XML as the
choice representation for input and output data. Our choice of software was made based on the availability of the
product and its widespread usage in the mortgage industry.
ART*EnterpriseÒ (A*E), a
product from MindBox[1]
Inc., is an integrated knowledge-based application development environment that
supports rule-based, case-based, object-oriented and procedural representation
and reasoning of domain knowledge. A*E offers cross-platform support for most
operating systems, windowing systems, and hardware platforms (Watson,
1997). The product allows seamless
integration with industry standard programming languages like C/C++ and offers
CORBA and Web features that allow the rule engine to communicate with
components written in Java or any other language.
3.1
High level architecture
A
typical component-based architecture for e-commerce application is usually
composed of three tiers. The thin
client layer is represented by the user interface implemented using dynamically
generated HTML. The user interface runs within popular web browsers (Netscape
Navigator, MS Internet Explorer, etc) embedding a Java virtual machine. The
middle tier includes the web server, the application server with a servlet
engine, the A*E rules engine server and the A*E-XML parser. A database back-end forms the final layer in
this architecture. Figure 1 depicts
this architecture.
The
application server uses Enterprise Java Beans (EJB) to seamlessly communicate
with the back-end process, the XML APIs and other protocols (such as CORBA
IIOP). In our design, the server listens for user http requests and delegates
A*E/XML requests to the specialized application servlet. The servlet processes
the HTML request and passes the results on to the A*E-XML parser. This component is an implementation of the
Document Object Model (DOM) parser based on the XML 1.0 specification, together
with the A*E rule engine.

Figure 1. High Level Architecture
There are a variety of XML parsers available free
of charge on the Internet. For this
application, we elected to build our own parser rather than use one of the ones
that are already available. To
understand our reason for doing so, we should first explain what we hoped to
accomplish with the XML parser. The
idea was to take an XML file and convert it into A*E objects. We wished, however, to first produce Java
objects as an intermediate step towards producing the A*E objects. The reason
for this was to help ensure compatibility with applications and environments in
which Java is a predominant technology.
Producing Java objects as a first step would make the XML file available
to any Java classes which might exist.
It would also allow preprocessing of the XML files within Java. If, for example, some form of semantic
validation on the XML file is needed, this could be done in Java.
Once the Java objects are produced, and
validated as needed, the next step is to invoke a method on the top level
object to emit the A*E code. As a
result, we chose to build our own parser in order that the Java objects be
structured in such a way as to easily permit building methods to emit A*E
code. However, if a clear standard
XML-to-Java parser were to emerge, which readily permitted adding methods to
the generated classes so as to allow the emitting of A*E code, that would
certainly be a suitable alternative to using our own parser.
The XML-to-Java parser is implemented using
JavaCC. This compiler-compiler
technology readily permits the building of parsers, which produce Java objects
as their output. The compiler produced
using JavaCC is itself a Java class and can be invoked from anywhere within the
Java virtual machine, for example from a servlet or a JSP page. The compiler also checks the validity of the
XML against the DTD and produces a tree of Java objects. Once the Java objects have been created, a
method is invoked on the top-level document object which searches through the
tree of Java objects in the document and generates a text file that contains
the appropriate A*E code. This code is
then available to be loaded into the A*E application.
4. Example
To
demonstrate the feasibility of the proposed architecture, we developed an
XML-enabled underwriting application developed using A*E. The underwriting process is simple: loan
eligibility is determined based on the front and back ratios. Data input and
output are in XML. Figure 2 shows a schematic of the prototype.

Figure 2. Application Schematic
4.1
Input
The input form is an HTML user interface
accessible using a web browser. The form outlay is simple, with a few key input
fields for harvesting user input for a loan application. The form data is submitted to a web browser via
the simple yet robust HTTP protocol. The data is then processed by a servlet
residing on the web server.
The parser receives the XML document from the
servlet, parses the contents and creates A*E objects. Listing 1.0 shows input and corresponding application objects.
Example
XML code input to the Parser:
<INPUT-OBJECT DATA_ID=”1001”
LOAN_AMOUNT=”100000.0”
LOAN_TYPE=”FRM” …
Example A*E code generated
by the parser for the XML input above:
(define-instance xml:Attribute9-19991215222255155
xml:Attribute
(xml:Has-Name “DATA_ID”)
(xml:Has-AttValue xml:AttValue11-19991215222255185)
(xml:ownerDocument xml:Document1-19991215222254234)
(xml:value “1001”)
)
(define-instance xml:Attribute12-19991215222255245
xml:Attribute
(xml:Has-Name “LOAN_AMOUNT”)
(xml:Has-AttValue xml:AttValue14-19991215222255265)
(xml:ownerDocument xml:Document1-19991215222254234)
(xml:value “100000.0”)
)
Listing 1.0: XML
code input to the parser
Note that timestamps are appended to A*E object
names to guarantee uniqueness even if multiple generated files are loaded into
the same A*E image.
4.2
The Rule Engine
The application loads the data from the parser,
made available as schemas in Art*Script. The rule engine first computes the
mortgage payment (principal and interest) based on the loan amount, the
interest rate and the loan term. The
rule engine then determines the eligibility of the case based on two ratios:
(i) a ratio of monthly-housing-expenses to monthly income and (ii) a ratio of
total-monthly-expenses (housing + other debts/commitments) to monthly income. A
simple threshold criteria is applied for determining the eligibility of a loan,
the front ratio should be no more than 28% and the back ratio less than
36%. Upon completion of processing,
the application would come up with a recommendation and an XML output object
data set. This recommendation and the
computed ratios are part of the output object data from the application.
The result, an XML document, is now available
either for display or for further processing.
Listing 2.0 shows the output from the application.
<?xml
version="1.0"?>
<OUTPUT_DOCUMENT
DATA_ID="1001">
<RECOMMENDATION_SECTION
DATA_ID="1001" RECOMMENDATION="ELIGIBLE">
</RECOMMENDATION_SECTION>
<FILE_ID_SECTION
DATA_ID="1001" FILE_TYPE="XML version 1.0" DATE="Fri
Jan 14 13:02:42 2000" AUTHOR="BRIGHTWARE">
</FILE_ID_SECTION>
<BORROWER_SECTION
DATA_ID="1001" BORROWER_NAME="Home Buyer One"
BORROWER_SSN="123-45-6789">
</BORROWER_SECTION>
<LOAN_DETAIL_SECTION
DATA_ID="1001" LOAN_TERM="30" LOAN_TYPE="FRM"
LOAN_AMOUNT="100000.0" MONTHLY_PI="632.07">
</LOAN_DETAIL_SECTION>
</OUTPUT_DOCUMENT>
Listing 2.0: XML document output from the A*E application
For the moment, in order to display the output
results in a client browser, we use the same application servlet to convert the
XML into HTML format.
5. Conclusion
XML
is an appropriate way for representing the semi-structured data that is present
on the Internet. One important consideration, is the need for standardization
of parsers in the AI-XML world. Standardization would allow us to focus on solving
problems, rather than selecting, for example, one of a set of parsers and
investing considerable time in a debate over which parser to use.
A
similar need for standardization lies in designing specialized markup languages
based on XML. Although XML allows for
extension by designing other forms of markup, it is not especially desirable if
a separate language is developed every time XML is used. However, sizeable
vertical industry segments, like the mortgage industry, would benefit from a
specialized markup of terms relevant to the mortgage domain.
Usage
of XML, however, does not completely eliminate the need for building
domain-specific application input layers.
In our example, the parser generates A*E objects; however, the format of
these objects is pretty standard across all domains. If we have specific information about a given domain, then that
information can be used to design A*E rules, which are able to massage the
objects into a format appropriate for the domain. Alternatively, additional methods could be written on the
parser’s generated Java objects that generate more domain-specific
objects. Note that none of this is
absolutely necessary: rules can be written using the objects as is, but such an
enhancement would make it easier to write rules for a specific domain.
6. Future Directions
Organizations
implementing e-commerce applications are rapidly adopting XML as their de facto
standard for data transfer between applications and partners. Dedicated
industry groups like Mortgage Industry Standards Organization (www.mismo.org)
have begun standardizing the XML transaction architecture for their
clientele. Key players in the
technology sector like IBM (IBM, http://www.research.ibm.com/rules/home/html)
and Sun (Sun Microsystems, http://jsp.java.sun.com/javaone)
are focussing on implementing/integrating rule engines as part of their
enterprise e-commerce architecture. These efforts underscore both the need for
the XML-rule-based interface described in this paper, as well as the need for a
domain specific application input layer.
For
knowledge-based systems to be practical and offer effective solutions to the
industry, it is imperative that knowledge engineers and the products themselves
be extensible and take advantage of the features uniquely brought forward by
XML.
7. References
1. Flynn,
P., et.al., University College, Cork, Internet Web Site, http://www.ucc.ie/xml.
2. Glushko,
R., Tenenbaum, J., and Meltzer, B., An XML Framework for Agent-Based
E-commerce, Communications of the ACM,
42:3, March, 1999.
3. Gold-Bernstein,
B. 1999. From EAI to e-AI. Application Development Trends, v6 n12.
4. Hayes,
C., Cunningham, P., Distributed CBR using XML, Proceedings of the workshop: Intelligent systems and Electronic
Commerce, Bremen, 1998.
5. Hayes,
C., Cunningham, P., Shaping a CBR View with XML, Technical Report
TCD-CS-1999-23, Trinity College, Dublin, 1999.
6. Kambhatla,
N., Budzikowska, M., Levesque, S., Nicolov, N., Zadrozny, W., Wicha, C., and
MacNaught, J., DMML: An XML Language
for Interacting with Multi-Modal Dialog Systems, Proceedings of the Twelfth Conference on Innovative Applications of
Artificial Intelligence, Austin, Texas, August, 2000.
7. Watson,
Ian. Applying Case-Based Reasoning: Techniques for Enterprise Systems. Morgan
Kaufmann Publishers inc., 1997.
8. IBM,
Internet Web Site, http://www.research.ibm.com/rules/home/html.
9. Mortgage
Industry Standards Organization, Internet Web Site, http://www.mismo.org
10. Sun
Microsystems, Internet Web Site, http://jsp.java.sun.com/javaone
11. World
Wide Web Consortium, Internet Web Site, http://www.w3.org/XML
12. World
Wide Web Consortium 2, Internet Web Site, http://www.w3.org/TR
13. XML.com,
Internet Web Site, http://www.xml.com/pub/98/10/guide2.html.
14. The
XML Cover Pages, Internet Web Site, http://www.oasis-open.org/cover.
15. The
XML Cover Pages 2, Internet Web Site, http://www.oasis-open.org/cover/AIML-alice.html.