REST Easy In the Semantic Web:
Describing and Querying “Triple Metadata” with REST URLs

by Kevin T. Smith and Mary Parmelee

The vision of the Semantic Web is a World Wide Web of self describing resources that are not just interconnected with cryptic hyperlinks, but interrelated with descriptive metadata. The foundation of the Semantic Web is built on simple metadata descriptors known as RDF “triples”. Each triple expresses a single, atomic metadata statement that relates a resource (subject) to an attribute value or other resource (object) via a property (predicate). A triple can be represented as a directed graph, where the subject and object are nodes, and the predicate is a directed edge pointing from the subject node to the object node. Multiple triples can be connected to create complex descriptions of interrelated resources. The simplicity of the triple is its greatest strength, because a subject - predicate - object relationship can literally be used to describe anything about anything. Whether it is a Web page, a live person, or a concept, you can describe it using RDF triples.

But what happens when we need to describe something about the triple itself? For instance, who created a specific triple? How current is it? Can the statement that the triple makes be trusted? As Semantic Web technology evolves from theory to practice, the need to track metadata about triples is becoming obvious. In this article we will propose using REST URLs as a mechanism for storing metadata about RDF triples, and for querying triples from your persistence store.

Along with its application to the Semantic Web for which it was intended, the power of RDF is now being applied to mission critical enterprise and government applications. Using RDF to manage proprietary data in a secure environment comes with a formidable set of challenges. Many enterprise scale systems are required to provide fined grained access control, data integrity and quality assurance functionality. This level of control is enabled through collecting, managing and storing metadata about each RDF triple, so that contextual information such as provenance (history of ownership), security, and more can be associated with the subject-predicate-object relationship. Although the RDF language includes a standard mechanism for this purpose called reification, scalability and robustness concerns related to the storage of additional triples for each relationship make this approach impractical in production systems. Numerous other approaches are currently being debated by the Semantic Web community and a few are being implemented by open source and commercial tools. Many tools and vendors are now supporting “quad stores”, which extend a triple in a persistence store by a fourth component for “context” (metadata about the triple). Through the use of quads, quad-based reification can occur, where the fourth item is a URI which uniquely identifies and reifies the triple without generating additional triples. Context components are typically used to associate triples with a model or named graph identifier. However, many applications need to associate triples with more complex metadata. Some experts suggest extending the use of quads to quints and even beyond to n-tuples for storing complex metadata, data structures and even chunks of code that apply to a triple.

Defining “Triple Metadata” in a Quad

As quad stores such as Sesame2, Kowari and NG4J are becoming more prevalent, we propose a new way to use quads, where the fourth element in a quad store is an informational REST URL that also functions as a query to the persistence store. The URL is based on a REST GET query pattern, and will contain not just model or named graph information, but all metadata about the triple. This metadata element can not only be used in standard queries (using SPARQL, for instance) to refine regular triple-based queries, but could also represent a Servlet or CGI-script URL that retrieves RDF triples based on their metadata.

In this article, we will describe a quad using the format
<Subject, Predicate, Object, Metadata>
where the fourth element in the quad contains metadata about the Subject-Predicate-Object triple. For example, without defining the format of the “Metadata” component of the quad yet, we will show two quads, representing two assertions in our persistence store:

<“http://www.trumantruck.com/restEasy”, author, “Kevin Smith”, Metadata>
<“http://www.trumantruck.com/restEasy”, author, “Mary Parmelee”, Metadata>

So what could metadata for these statements be? Well, the name of the person who made this assertion about us authoring this article can be metadata, as can be the date that the assertion was made. Because the authors of this article are government contractors, we will even assign it a security classification UNCLASSIFIED.

Now let’s define the metadata section, using a REST URL. We suggest an easy and flexible approach, defining the Metadata URL to use the format of:

http://yourHost/metadata?attribute1=value1&attribute2=value2 (and so on)

For example, using the metadata we used earlier, metadata about the triples we described above could be:

http://foo/metadata?asserter=JSutton&assertedDate=20060615&securityClass=UNCLASSIFIED

That URL could be used in both quads that we described above. If we wanted to query the triple store, using a constraint based on the triple metadata, an example SPARQL query could be:

 SELECT ?subject ?property
   WHERE {
   GRAPH ?data
   {
      ?subject ?property Smith .
   }
   FILTER regex(str(?data), “asserter=JSutton”)
Such a query would allow you to constrain RDF queries, using the triple metadata as a filter.

RESTful Benefits

The beauty of this solution is that now that we have defined this metadata URL, we can actually implement a servlet or a CGI script to query the persistence store based on the URL, returning XML-serialized RDF.

So, the original metadata URL we used (http://foo/metadata?asserter=JSutton&assertedDate=20060615&securityClass=UNCLASSIFIED) can now be used to query the persistence store for everything UNCLASSIFIED that JSutton asserted on 20060615. In addition to the triples we have already defined (the fact that Kevin Smith and Mary Parmelee wrote this article), more RDF triples would be returned. For example:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:md="http://www.trumantruck.com/metadata/"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://xml.com/SemanticWebArticle/">
     <md:author>Mary Parmelee</md:author>
     <md:author>Kevin Smith</md:author>
     <dc:title>Rest Easy in the Semantic Web</dc:title> 
  </rdf:Description>
  <rdf:Description rdf:about="http://www.nhl.com/">
    <dc:title>National Hockey League Web Site</dc:title>
    <md:userRankOpinion>Very High</md:userRankOpinion>
  </rdf:Description>
</rdf:RDF>

As you can see, the added benefit of the informational REST URL in the quad is that it can be used for “triple metadata” queries. This allows you to manipulate triples like records in a database without extending to n-tuples.

Metadata about triples could even support simple workflow, trust, and data integrity functions. For example, the original metadata URL can be modified to represent JSutton as a trusted asserter, and to record the date and time that the triple was asserted and deleted.

http://foo/metadata?trustedAsserter=JSutton&assertedDateTime=20060615120107&deletedDateTime=20060621090036&securityClass=UNCLASSIFIED

As you can see, this example REST URL allows you to retrieve only trusted triples, supports time interval queries, and supports provenance by retrieving triples even after they have been deleted by a user.

Conclusion

This paper has presented a REST URL approach for expressing metadata about triples in quad stores. An alternative to traditional reification approaches, it provides a lightweight storage option, decreasing persistence store bloat, and making a solution more scalable over time.This flexible approach, can handle any triple metadata.. from security information to provenance tracking. While this method still needs to be tested for factors such as query performance, we believe that it would work well for loosely coupled, distributed systems where many asserters are making changes without knowledge of each other’s actions. Finally, the URL that expresses metadata can also be implemented on the server side to do metadata queries, providing a RESTful web-based GET of the triples in the persistence store.


About the Authors

Kevin T. Smith is a Technical Director at McDonald Bradley, where he leads the SOA & Semantics Security Team (S3T), focusing on security solutions for projects revolving around SOA and the Semantic Web. He holds graduate degrees in Computer Science, Software Systems Engineering, and Information Systems Security. He is the author of many articles revolving around information security and software engineering, and he is also the author of several technology books, including The Semantic Web: A Guide to the Future of XML, Web Services and Knowledge Management. Smith is a frequent speaker and workshop presenter at technology conferences, such as the RSA Security Conference, The Object Management Group (OMG), JavaOne, ApacheCon, and Net-Centric Warfare.

Mary Parmelee is an Ontologist in McDonald Bradley Inc's Advanced Programs Group, where she leads the Semantic Services Team specializing in semantic technology applications for data interoperability. Mary holds a MSIS degree from the University of North Carolina at Chapel Hill, is active in semantic technology professional groups, and is an invited speaker on semantic technology at international conferences and technical workshops.