Practical Lessons in Supporting Large-Scale Computational Science

Ron Musick, Terence Critchlow
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551
{rmusick, critchlow}@llnl.gov

Abstract

Business needs have driven the development of commercial database systems since their inception. As a result, there has been a strong focus on supporting many users, minimizing the potential corruption or loss of data, and maximizing performance metrics such as transactions-per-second and benchmark results [Gra93]. These goals have little to do with supporting business intelligence needs such as the decision support and data mining activities common in on-line analytic processing (OLAP) applications. As a result, business data are typically off-loaded to secondary systems before these activities occur. In addition, they have little to do with the needs of the scientific community, which typically revolve around a great deal of compute and I/O intensive analysis, often over large data with high dimensionality. For scientific data, in many cases the data was never collected in a DBMS in the first place, and so the analysis and visualization take place over specialized flat-file formats. This is a painful solution, because a DBMS has much to offer in the overall process of managing and exploring data.

Of late, industry and the research community have been pushing to develop DBMS-based systems that will break this mold, and provide the needed OLAP support. The recent activity in OLAP [GC97, OLA98], multi- dimensional databases [TD96], ORDBMS [SM96], and the TPC council's TPC-D [TPC98] benchmark all testify to the strength of this new direction. This is a promising change of focus. OLAP optimizations are much closer than on-line transaction processing (OLTP) to supporting the interactive computational data analysis (ICDA) activities that take place in scientific domains [MM97]. OLAP and ICDA do not, however, represent identical workloads. In fact, little is known about exactly how DBMS technology fails to meet ICDA needs. We explore this issue in some depth, describing an evaluation of DBMS technology for large, high-dimensional computational data (see [Mus99] for more detail). After extensive testing, we can report that the technology is much closer to being able to support ICDA than one might expect. Furthermore, there is a clear evolutionary path that should lead to full support once the technology matures.

The main function this report serves, in lieu of stable and well-known benchmarks for ICDA, is to provide a practical evaluation of the current state of DBMS technology. In Section 2 we describe the characteristics of ICDA data and workloads, while Section 3 explains the evaluation criteria. Section 4 contains the bulk of the evaluation results, focussing first on relational databases, then discussing the newer object-relational approaches that have appeared commercially in the past few years. Finally, we conclude with the future directions and research that may finally integrate ICDA and mainstream database management systems.

Appeared in

Sigmod Record, V28 #4, December 1999.

Look at the Paper (ps.gz, pdf.gz)