Ad hoc Query Support for Very Large Simulation Mesh Data: the Metadata Approach
Byung S. Lee, Robert R. Snapp
Department of Computer Science
University of Vermont
{bslee, snap}@cs.uvm.edu
Terence Critchlow
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551
critchlow@llnl.gov
Ron Musick*
iKuni, Inc
Palo Alto, CA 94304
musick@ikuni.com
Abstract
We present our approach to enabling approximate ad hoc queries on
terabyte-scale mesh data generated from large scientific simulations
through the extension and integration of database, statistical, and
data mining techniques. There are several significant barriers to
overcome in achieving this objective. First, large-scale simulation
data is already at the multi-terabyte scale and growing quickly, thus
rendering traditional forms of interactive data exploration and query
processing untenable. Second, a priori knowledge of user queries is
not available, making it impossible to tune special-purpose solutions.
Third, the data has spatial and temporal aspects, as well as
arbitrarily high dimensionality, which exacerbates the task of finding
compact, accurate and easy-to-compute data models.
Our approach is to preprocess the mesh data to generate highly
compressed, lossy models that are used in lieu of the original data to
answer users' queries. This approach leads to interesting challenges.
The model (equivalently, the content-oriented metadata) being
generated must be smaller than the original data by at least an order
of magnitude. Second, the metadata must contain enough information to
support a broad class of queries. Finally, the accuracy and speed of
the queries must be within the tolerances required by users. In this
paper we give an overview of ongoing development efforts with an
emphasis on extracting metadata and using it in query processing.
Best Paper Nominee at the
Brazilian Symposium on Databases, Rio de Janeiro, Brazil, October 2001.
Look at the Paper (pdf.gz)
Reprinted in
Brazilian Computer Society, Vol. 8, No. 1,
July 2002
Look at the Paper (pdf.gz)
* Work done while author at
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551