Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing
R. Kamimura, G. Abdulla, T. Critchlow, C. Baldwin, I. Lozares, N. Tang
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551
B. Lee
Department of Computer Science
University of Vermont
Ron Musick*
iKuni, Inc
Palo Alto, CA 94304
musick@ikuni.com
Abstract
As datasets grow beyond the gigabyte scale, there is an increasing
demand to develop techniques for dealing/interacting with them. To
this end, the DataFoundry team at the Lawrence Livermore National
Laboratory has developed a software prototype called Approximate
Adhoc Query Engine for Simulation Data (AQSim). The goal of AQSim is
to provide a framework that allows scientists to interactively perform
adhoc queries over terabyte scale datasets using numerical models as
proxies for the original data. The advantages of this system are
several. The first is that by storing only the model parameters, each
dataset occupies a smaller footprint compared to the original,
increasing the shelf-life of such datasets before they are moved to
archival storage. Second, the models are geared towards approximate
querying as they are build at different resolutions, allowing the user
to make the tradeoff between model accuracy and query response time.
This allows the user greater opportunities for exploratory data
analysis. Lastly, several different models are allowed, each focusing
on a different characteristic of the data therebyenhancing the
interpretability of the data compared to the original. The focus of
this paper is on the modeling aspects of the AQSim framework.
Appeared in
Proceedings of the 7th Joint Conference on Information Systems, Carey,
NC. Sept. 2003.
Look at the Paper (pdf.gz)
* Work done while author at
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551