Running the Web Backwards: Appliance Data Services

Andrew C. Huang, Benjamin C. Ling, John J. Barton*, Armando Fox
Stanford University and *Hewlett-Packard Laboratories

Abstract

"Appliance" digital devices such as handheld cameras, scanners, and microphones generate data that people want to put on Web pages. Unfortunately, numerous complex steps are required. Contrast this with Web output: handheld web browsers enjoy increasing infrastructral support. We hypothesize that the utility of input appliances will be greatly increased if they too were "infrastructure enabled". Appliance Data Services attempts to systematically describe the task domain of providing seamless and graceful interoperability between input appliances and the Web. We offer an application architecture and a validating prototype that we hope will "open up the playing field" and motivate further work.

Our initial efforts have identified three design challenges: device heterogeneity, user interface, and harnessing infrastructure services. Our architecture isolates device and protocol heterogeneity considerations into a single extensible architectural component, allowing most of the application logic to deal exclusively with Web-friendly protocols and formats. To distribute the user interface between the network infrastructure and user devices, we tag input with commands specifying how the data is to be manipulated once injected into the infrastructure. Additional conceptual contributions include canonicalization and late binding of these commands to applications, two mechanisms that improve the user experience by allowing "natural" extensions of the device's UI to be used for application selection and minimize the amount of configuration required before end-users benefit from Appliance Data Services. Finally, we describe how services can be applied to process data, using simple HTTP connected conversion services as examples; we could also leverage services connected via Jini or CORBA. We also describe an implemented prototype of parts of the architecture and a specific application.

Background, Motivation, and Challenges

Much recent work has focused on accessing the Internet from "post-PC" devices, especially "information appliances" such as PDA's, cell phones, and palmtop computers [FGCB98]. Surveying that work, we see that these devices, despite their inherent hardware, software, and network limitations, can interoperate with the rest of the Internet through infrastructure support. In particular, software such as transformation proxies [FGC+97,Pro97,FGG+98] and wireless protocol gateways [Met95, WAP97] enable these devices to leverage the enormous installed infrastructure of servers, content, and interactive services. In the words of one mobile computing project [KB+96], "Access is the killer app" for such devices. We capture this effect by saying that the post-PC devices have become more useful because they are now infrastructure enabled. Conversely, and partially as a result of infrastructure enablement, the Internet has begun to adapt to these devices and we are now seeing services tailored for their use, such as Yahoo Mobile and a variety of sites that feature "Palm-friendly" pages in addition to their desktop content. These new mobile computing devices are no longer isolated "islands" of computation; they are participants in an Internet system.

Infrastructure Enablement for Input Centric Devices

Almost all of the work on device access to the Internet has been focused on devices as web-browsers. We can say that it is focused on output from the web. Our goal is to achieve infrastructure enablement for digital input-centric consumer appliances; we believe web-like infrastructure provides a good model for this enablement. Thus we aim to "run the web backwards", adding infrastructure to support input from portable input-centric digital devices.

By input-centric we refer to devices whose primary function is not to extract and browse digital information, but to create it. Examples include digital still cameras and video cameras, handheld scanners, and portable audio recorders. In part because of the pervasiveness and success of the established Internet infrastructure, much of the data created by these devices ends up in the Internet infrastructure, e.g. posted on Web sites. However, although much has been done to simplify the process of extracting information from the Internet to a variety of devices, the process for injecting data from devices into the infrastructure is extremely painful. We propose a framework and software components for facilitating this process, to stimulate further work that will allow input-centric devices to enjoy the same success as information browsing appliances enjoyed by becoming infrastructure-enabled. We call the resulting system "Appliance Data Services."

Contributions

The three main contributions of this paper are a systematic identification of the problem space, an architecture for the infrastructure-enablement of input appliances, and an implemented prototype of parts of the architecture. Our discussion of the problem space was informed by experiences with existing web-input approaches and the process of formulating the architecture and prototype described here. The architecture attempts to reconcile the diversity of devices and protocols to be supported, the desired flexibility in selecting from a wide range of available Web applications, and the requirement of an unobtrusive "no-futz" user experience in operating the device and connecting it to the Web. In particular we identify command tagging as a fundamental requirement for Appliance Data Services (ADS), describe an implemented approach to tagging that is extensible to a range of devices and communication protocols, and propose command canonicalization and late binding as specific architectural mechanisms to address the user experience concerns. We describe our proposed framework in light of the technical and usability challenges it addresses, describe our experience with a working prototype, and use it to motivate further research.

Our prototype implementation sketches all the elements of the ADS so that we can experiment with an end-to-end prototype. We tried to avoid dependency on a particular software framework for the application logic; as we describe, a number of such frameworks are either commercially available or under development as research projects, and we wish to enable interoperation with as many of these as possible. The architectural mechanisms we describe can be implemented in the context of any of these frameworks; we describe our experience building a prototype using simple Web protocols.

The remainder of this section motivates the problem and identifies design challenges through a simple scenario, and introduces command tagging as a fundamental requirement for ADS. The second section describes the architecture and our implemented prototype, identifying ADS as a particular application domain for infrastructure software frameworks such as Jini [AOS+99] and Ninja [GW+99], which we describe briefly. We conclude by describing our proposed extensions, lessons learned from the initial prototype, and discussion of work in progress.

Running the Web Backwards

Imagine that we have taken a photo with a digital still camera and that we would like to publish the photo on the Web. Many current digital cameras are equipped with infrared transceivers, which can be used to communicate photo data to another IR-equipped device such as a laptop or desktop PC. In an ideal world, we would point the camera's IR at the Internet-connected laptop or desktop, press a button to "squirt" the photo data out of the camera, and sit back while infrastructure-deployed software reads the image, converts it to a Web-compatible encoding, authenticates itself to a remote server using your credentials, posts the image file there, and arranges for an HTML link on a designated "photo album" page to point to it.

The reality is considerably uglier, as we found when we helped a California elementary school publish their "Science Fair" results on the Web. We photographed each of 156 student posters with a digital camera and scanned one key item from each poster using a handheld scanner. To upload the data to the Web, we pointed the IR transceivers on the camera and scanner to an IR-equipped laptop connected to the wired Internet. Unfortunately, we had to install and learn to use different IR receiving software for each device. Furthermore, the uploaded images were only identifiable through serial numbers embedded in their file names on the laptop, leading to a lot of manual file copying and filename manipulation to coordinate the two data streams. The scanner's TIFF files had to be manually converted to JPEG for Web publishing. Finally, the school's Web server allowed only FTP write-once access, so that changes after uploading required manual intervention by a site administrator.

Taking these obstacles together, it is not surprising that the Web has so far failed to magnify the impact and usefulness of these input devices as it has for (e.g.) PDA's and cell phones. Small scale attempts such as the PhotoNet and Cartogra web sites are attempting to simplify this problem, but they provide only a solution for a very specific application, namely the publishing of photos to a site-maintained album in one of a limited set of presentation formats. We would like a more general solution that

  1. handles devices other than cameras such as handheld scanners and digital audio recorders,
  2. allows the user to specify what happens to the data (how it is manipulated) once it enters the infrastructure, and
  3. can coordinate the input from multiple devices.

Such an infrastructure would make it possible to have photos automatically routed to "the science fair application", which would combine them with scanner images from the project's conclusion and add them in a predefined format to the Science Fair website. Selected photos might also be routed to the user's personal photo album, perhaps maintained on a site such as PhotoNet or Cartogra.

Design Challenges

Recording the science fair was painful for two reasons. First, at a low level, appliances do not understand Internet standards such as HTTP and HTML, although this is slowly changing. Second and more importantly, at a higher level, there is no deployed application infrastructure that automatically performs the data transformations, protocol conversion, and data routing in a way that is largely transparent from the end user's point of view.

This observation leads us to identify three specific design challenges for deploying software infrastructure to address this problem:

  1. Dealing with device heterogeneity: This challenge involves being able to handle the various data formats and protocols chosen by device vendors (e.g., JetSend, IRDA, and cradle synchronization). This aspect is further complicated when we want to be able to merge multiple data streams from different devices, as in the "Science Fair" scenario. In addressing this challenge, we leverage significant prior work on dealing with heterogeneity in the context of information delivery to post-PC devices [FGCB98]. As in the prior work, it is important that we provide an extensible solution that will generalize to other devices and protocols.
  2. Providing "No-futz", out-of-the-box operation. Task-specific devices such as digital cameras have quite limited user interfaces by design: unobtrusive and familiar interface metaphors, such as the "point and shoot" metaphor for a camera, make the devices easy to use. However, it seems intrinsic to this task domain that users need to be able to specify what should happen to the data they are "injecting" into the infrastructure. The challenge is how to provide a flexible mechanism for specifying this without adding obtrusive features to the device UI, or while accommodating devices like digital microphones whose UI's may be essentially nonextensible. As a corollary, we want to enable devices to exhibit some reasonable "out of the box" behavior with zero user configuration, to allow new users to immediately begin experiencing the value of combining the device with Internet services.
  3. Leveraging existing infrastructure services: The emergence of Internet services and service infrastructures has begun to transform how applications are conceived and delivered. We want to explore modular composition of Internet services for the construction of applications that support or exploit data input devices.

Our answer to these challenges is an Appliance Data Service application architecture

Architecture

Before describing the architecture in detail, we clarify our use of two specific terms: "infrastructure" and "service framework".

By "infrastructure" we refer to the deployed collection of hardware and software accessible directly or indirectly via an Internet (usually Web) programmatic interface.  In addition to supporting "destination applications" such as web content, Web-accessible services (banking, etc.) and Web-accessible databases, the infrastructure includes "faceless" application components.  For example, most Web users never interact directly with Web content caches or accelerators, but they are deployed throughout the infrastructure and programmatically accessed by destination sites.  As another example, many portal sites that provide a search or query feature implement the feature by performing the search on a remote machine operated by a different vendor, and reformat the results in HTML for presentation to the user.  In other words, the Internet infrastructure is, roughly speaking, the collection of all code that implements or support Internet sites, and the software and hardware mechanisms by which they communicate.

By "service framework" we refer to any programmatic framework for deploying new services in the infrastructure.  An example of a somewhat ad-hoc framework that emerged early in the Web's development is the collection of mechanisms used for Web site intercommunication and code execution: HTTP, CGI-bin, SSL handshaking, etc.  In this framework, each HTTP-accessible web service can be considered a "service module", and the modules communicate via HTTP and SSL.  Various frameworks under development in academic research and industry, including Jini [AOS+99], Ninja[GW+99], ChaiServer [chai] and HP e-speak, feature a richer set of programmatic abstractions.  Typically a service framework provides three sets of mechanisms: the ability to compose services, either by writing a meta-service that calls each subservice, or providing a way to name and execute a pipeline-like composition of services; an inter-service communication mechanism (HTTP, Java Remote Method Invocation, etc.) that may provide specific features such as authentication, security, etc.; and a registration/discovery service that tracks which services are available and allows services to be looked up by attribute value.

Conceptual diagram of the Appliance Data Services architecture. Devices and services are joined through access points and the application controller.

The ADS architecture, shown in the above figure, was developed to address the challenges outlined in the previous section. The envisioned use of such a framework begins with a user transferring data into the system via an Access Point, a network-connected hardware/software gateway that receives data from the user's appliance. The Access Point passes the data, user identifier and command-tag associated with the data to an Application Controller, which determines how the data is to be handled. The Application Controller activates the necessary infrastructure-resident Modular Composable Services that process and store the data. Each of these components mentioned in this usage model is justified and described in detail in the following sections.

Access Point

The Access Point is the point of entry for data that is to be pushed into the system. It consists of necessary hardware and software to receive data from appliances.   Hardware might include an IR transceiver, RF basestation hardware, or a cradle or cable for "docking" an appliance.  Software includes both hardware-specific drivers and the Access Point functionality described below.  The Access Point could be implemented as a commodity PC outfitted with the appropriate hardware interfaces, or it could be designed as a special-purpose "network appliance".

The main architectural role of the Access Point is to isolate device heterogeneity considerations in a single architectural component.  By presenting a device-independent interface to the user and to the rest of the system, the Access Point converts a potentially large configuration space of handheld digital input devices into a small number of server-digestible data types.  To allow every wireless device to send data without manually loading device-specific software on the receiver or receiver-specific software on the device, the Access Point must be extensible. Although we have deliberately avoided a design that assumes that appliances can run Java (to avoid an artificial dependence on a not-yet-deployed technology), a mechanism such as Jini that supports downloading of communication protocol code into the Access Point could be explored to provide such extensibility.

Along with the actual data, the Access Point must obtain a user identifier and command-tag from the device and attach these to incoming data in a client-protocol-specific manner. These metadata are necessary for a number of reasons:

  1. Application selection: The command-tag names the high-level application that the user wants to perform on the data (e.g., "Send picture to my public_html directory"), using a binding mechanism described later.  However, the command-tag alone is not sufficient to define the application since different users may have different meanings for the same tag, or result in different semantics in the interpretation of the tag (e.g. "My web site" maps to different URL's for different users). Thus, a user identifier is required to fully specify the desired application.
  2. Access control: The user identifier is required to determine what credentials are to be attached to the application request.  For example, a user should be allowed to push data into her own public_html directory, but not necessarily those of other users.  Furthermore, some services may be accessible only to authorized users. Thus, the system needs some identifier to attach credentials to the request.
  3. Other service features requiring a command-tag and user identifier include billing, security, and personalization.  Although we have not investigated the implementation of such features, we have left the user identifier as a necessary "hook" for adding these capabilities later.

A simple example of a command tag is text metadata that results from the user choosing a particular menu item on the device. A more sophisticated tag is possible with recent models of digital cameras that allow the embedding of audio-coded metadata in each image. In the latter example, the Access Point receives images from the camera and extracts the audio metadata for use as the command-tag. We also hope to explore merging of command-tag inputs from one device with data from another device.

Some devices have such limited user interfaces that there is no graceful way to specify a different command-tag with each device input.  In this case, the Access Point attaches the special command tag "default" to the incoming data.  We describe later why this is important architecturally.

The Access Point as we have described it is stateless and configuration-less, which makes it appealing to deploy as (e.g.) a publicly available kiosk or Web-centric service. In fact, the Access Point looks very much like a reverse Web-browser, its role being to read device-specific data and write MIME-typed data streams to the rest of the system.

Service Modules and Application Controller

Once the typed data, command, and user identifier are available from the Access Point, they are handed off to the application infrastructure for application execution.   In our system the "application" relies on standalone or composable Internet service modules; for example, an image format translation service might take an image and some parameters as input, and deliver the image in a different encoding.  Since ADS is an application architecture and not a service framework per se, our architecture does not define which service framework should be used to construct and execute the application.   We believe there are good engineering reasons to construct the application out of composable building blocks, but nothing in our architecture requires this.   Architecturally, it suffices to distinguish one service module that acts as an ADS application controller, which must have the following functionality:

We re-emphasize that there are many possibilities for executing code in the infrastructure, and we do not prescribe any particular method. Depending on the mechanisms used, the Application Controller may be a separate service module, or it may simply be a designated entry point into a piece of monolithic code in a larger service module, perhaps executed directly by an HTTP server as a CGI-bin script. However, the two Application Controller tasks of command canonicalization and command resolution are fundamental to our architecture, so we describe these in more detail before moving on to the description of specific mechanisms used in our implemented prototype.

Command Canonicalization

The reason that command canonicalization and late binding are fundamental concepts in the architecture, although the implementation media may vary, is that together they effectively address the problem of command-tagging without complicating the user experience. We first describe how each of the two architectural mechanisms works, and then give examples illustrating how they support an unobtrusive user experience.

Much recent work on information delivery (web output) has focused on sophisticated data transformation services in the infrastructure.  We have already described some ADS applications that make use of data transformation. Less obviously, but advantageously from the point of view of providing natural extensions to device UI's, we can also leverage the transformation infrastructure to transform command tags into a canonical form. For example, recent digital camera models such as the Kodak DC265 allow the embedding of audio-coded metadata in each image, in the form of a short audio clip. We can use an infrastructure speech-to-text service (operating against a fixed and very limited size vocabulary) to transform a spoken command word or phrase into a string, which is then looked up in the template database.

Command transformation is appealing because it decouples the method used to specify commands from the resolution of commands for application selection. Command transformation potentially allows each device's UI to be extended for command-tagging in the way that is most natural for that device.

Command Resolution Using Late Binding

Commands are "bound" to application descriptions in a separate template database, itself accessible via an infrastructural (e.g. Web) interface. This Web-accessible database is used to resolve a command into a script template ("work order" in our prototype, but machinery may vary according to the infrastructure framework used). Late binding of commands in a separate database contributes to a "no-futz" user experience in at least two ways.

First, users can change or add command behaviors by modifying the database directly through a familiar Web interface.  Even if an appliance is Web-configurable (e.g. using vendor-supplied software), centralized configuration frees users from having to configure each device independently.  In fact, we envision third-party template databases that free the average user from worrying about how to construct new behaviors. Late binding therefore expands the repertoire of potential behaviors available to services, without burdening each device with the obligation of supporting a UI flexible enough to distinguish among all available commands, perhaps presenting only a subset at any given time.

Second, by modifying the binding of the special command tag "default", a user can modify the behavior of data coming from devices with non-extensible UI's. Thus, for digital cameras that provide no convenient metadata mechanism in which a command can be embedded to accompany a photo, the user can simply re-bind the default behavior to a new application in the command database, which has the same result as redefining the camera's (non-configurable) behavior.

A diagram of command transformation using late binding.

Prototype Implementation

This section describes the implementation status of the components -- Access Point, Application Controller, and Modular Services -- shown in the figure below. To make the description more concrete, the components are described in the context of the implemented application, a conference attendee list Web site. This application involves the user taking pictures and scanning business cards of conference attendees to be published on the conference Web site.

Block diagram of an Appliance Data Services implemenation. Dashed lines are control information; solid lines are data. Each box represents a web server.

Access Point

After the user has taken a picture of an attendee, the user can point the camera's IR port at the Access Point, which is implemented on an IR-equipped laptop running Windows 98. The user then pushes a transfer button on the camera or scanner, and the image is received by the Access Point.

Our prototype AP supports digital cameras, handheld scanners, and PalmOS devices that use Hewlett-Packard's JetSend IR data transfer protocol. On the Access Point's output, an HTTP POST is used to notify the Application Controller of the received data. 

Application Controller

The Application Controller is implemented as an HTTP-compliant server that accepts POST requests from the Access Point.  When the Application Controller receives notification of data being received at the Access Point, the Controller performs the following steps:

  1. determines which data conversions are necessary,
  2. locates services that implement such conversion,
  3. locates the storage service where the conference attendee page resides, and
  4. invokes these services on the data.

These steps illustrate the kind of processing we expect the Application Controller to perform after it has used the command-tag to select the application. Since implementation of command-tag canonicalization and command resolution is still in progress, the name of the attendee list application is coded directly in the Controller. The application consists of Java code that attempts to POST pairs of JPEG files to a storage web site.

The determination of data conversions is done simply by requesting (or "discovering" as it is usually termed) services with attributes matching the output type, JPEG, to the input types the Access Point sends. In this case, the input from the camera will require no conversion and the input from the scanner will require conversion from TIFF to JPEG. Note that that application has very little logic for this step: the standard data types are just set by the input devices and the final web-page requirements.

Service discovery is accomplished using an implementation of the Service Location Protocol (SLP) [slp] from HP's ChaiServer system [chai]. This service directory is populated by executing processes on hosts in the network that register themselves according to the SLP procedure. Our service descriptions are arbitrary at this point; one of the important issues for experimentation is what kinds of service descriptions can be effective for a wide range of applications.

Using the service descriptions returned by the SLP directory, an XML workorder is created in the Application Controller. Our preliminary XML format contains one entry for each "job" to be performed to accomplish the high-level application. Within each entry are service-specific arguments, such as input/output types, destination, and so forth. The particular format and fields chosen are not a general solution for composable services. Rather, the choice of XML and the format were chosen for their expressiveness and simplicity.

Finally, the act of invoking the application simply involves doing an HTTP POST of the workorder to the first service to be performed on the data. In the case of the attendee list application, the workorder is sent to a service that can perform TIFF to JPG conversions.

Composable Services

Our services are simple HTTP daemons modified to recognize special path values in POST requests and invoke a special code path when a matching POST is seen.  That code parses the XML workorder and performs the indicated operation, which in the case of the attendee list application consists of TIFF-to-JPEG conversion.  This will cause the service to send an HTTP GET request to the preceding service host or to the Access Point host to retrieve the data to be converted.  When conversion is complete, the service daemon stores the converted data locally and edits the XML document to change the source field of the next entry in the workorder to point to the converted data. This enables the next service to retrieve the converted file using HTTP GET.  The service then POSTs this modified workorder to the service specified in the next entry.

The current entry is also labelled as finished; leaving finished entries in the XML document allows the workorder to be used for such things as intermediate status information, bookkeeping, billing, and possibly, error detection.

Discussion

We set out to explore the potential for combining the new generation of digital input devices with emerging Internet services but using web infrastructure for input rather than output. Our experience in constructing the Science Fair web site without infrastructure support convinced us that such support would be essential for successful integration. Our experience in trying to build such infrastructure has been mixed. The Access Point prototype works, but it is currently not structured in an extensible way.  We need a mechanism for plugging additional protocol modules into the Access Point, perhaps using Java for portability and modularity.  Harder, we need the Access Point to ultimately support the most widely-deployed protocols, leading to a potentially large development effort.

The Application Controller is still in an early stage; the balance between end-user ease-of-use and application generality will be a great challenge. The modular services interconnected with HTTP and XML turned out to be surprisingly simple to implement. Our attendee list application revealed many issues in robustness and usability that remain to be tackled, some of which we describe in the Open Issues section.

There is also a larger issue to consider. This web-input experience might be considered inappropriate for setting the requirements for Appliance Data Services. Arguably, a school science fair or conference attendee list is not a typical "killer app", and the constraints imposed by the experience might not represent the bulk of applications of a web-based data-input services systems in the future. However, we argue that this is instead a glimpse at the future on two grounds:

The Web's simplicity allowed it to quickly evolve from a simple way for researchers to publish static content to a universal interface for sophisticated services such as banking, shopping, and mapping. By analogy, we hope that ADS--which "runs the Web backwards"--will make the 'trivial' task of data-input simple enough that applications that leverage it become more widespread and compelling.

Open Issues

ADS is an early prototype first step toward infrastructure support for appliance data services.  Although we believe we have identified some fundamental architectural mechanisms for such applications, we have barely begun to explore the issues involved in making ADS "real":

Failure Semantics: An issue we have yet to tackle is how to report success or failure of the application to the user. This is a problem of semantics, not just implementation: only an end-to-end indicator of application success or failure is likely to be useful ("The photo got posted" or "it didn't"), but in some cases the application may be sufficiently long-running that it is unreasonable to expect the user to wait for an end-to-end check. A real concern is making sure the user's expectations are set correctly: in the digital-camera scenario, if the user successfully injects camera data into the infrastructure, the user may feel it is then safe to erase the camera's memory. In fact this is only safe if the application can make some guarantee about the persistence of the injected data, if not the success of complete application execution, so that recovery can be attempted later. We speculate that the application logic could include a separate end-to-end acknowledgment delivered to the user "out of band" with respect to ADS; perhaps the user can be sent email upon successful completion of the application.

Security and Privacy: Although we envision the Access Point as a public shared resource, the user identifier accompanying the data entering the Access Point is sufficient to bootstrap a secure connection to the rest of the infrastructure. Because we have not yet investigated how best to provide secure and private service, we have deliberately avoided specifying the format of the user identifier. It might, for example, consist of an identification token accompanied by a challenge/response pair that the AP can use to authenticate itself to the Application Controller. This mechanism, which does not require a user's secret key to be revealed to the AP, is analogous to the mechanism used for roaming in the GSM cellular telephone system. In any case, the user has to trust the Access Point not to maliciously eavesdrop or tamper with the data coming from the device. Since users currently appear to be willing to read their email on shared kiosks in airports and hotels, we do not expect trusting the Access Point to be a major obstacle to deployment.

Conclusion

Our overarching goal has been to enable the same level of innovation in connecting input-centric appliances to the Web as has been achieved in the last few years for Web information delivery appliances. To this end, we identified specific design challenges and proposed solutions to them in the context of an enabling architecture:

We hope and anticipate that our initial work on ADS will encourage others to leverage the Web to magnify the usefulness of input centric appliances.


References