Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing

R. Kamimura, G. Abdulla, T. Critchlow, C. Baldwin, I. Lozares, N. Tang
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551

B. Lee
Department of Computer Science
University of Vermont

Ron Musick*
iKuni, Inc
Palo Alto, CA 94304
musick@ikuni.com

Abstract

As datasets grow beyond the gigabyte scale, there is an increasing demand to develop techniques for dealing/interacting with them. To this end, the DataFoundry team at the Lawrence Livermore National Laboratory has developed a software prototype called Approximate Adhoc Query Engine for Simulation Data (AQSim). The goal of AQSim is to provide a framework that allows scientists to interactively perform adhoc queries over terabyte scale datasets using numerical models as proxies for the original data. The advantages of this system are several. The first is that by storing only the model parameters, each dataset occupies a smaller footprint compared to the original, increasing the shelf-life of such datasets before they are moved to archival storage. Second, the models are geared towards approximate querying as they are build at different resolutions, allowing the user to make the tradeoff between model accuracy and query response time. This allows the user greater opportunities for exploratory data analysis. Lastly, several different models are allowed, each focusing on a different characteristic of the data therebyenhancing the interpretability of the data compared to the original. The focus of this paper is on the modeling aspects of the AQSim framework.

Appeared in

Proceedings of the 7th Joint Conference on Information Systems, Carey, NC. Sept. 2003.

Look at the Paper (pdf.gz)

* Work done while author at
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551