Scalable Data Mining through Fine-Grained Parallelism: the Present and the Future

Chandrika Kamath, Ron Musick
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551
{kamath2, rmusick}@llnl.gov

Abstract

Many organizations, both scientific and commercial, are routinely gathering data at an ever increasing pace. In order to make full use of the information in this data, it is becoming clear that we need data mining techniques that are scalable, that is, techniques that can use additional computational resources in an effective manner to solve increasingly larger problems. In this Chapter, we review ways in which fine-grained parallelism, using tightly-coupled multiple processors, can be used to build accurate models quickly. Focusing on both the data preparation and the pattern recognition steps, we survey the current state-of-the-art in this rapidly evolving field of parallel data mining, and explore ways in which we can benefit from developments in other related areas. Looking to the future, we discuss areas where additional research is needed to make parallel, fine-grained data mining a viable means of exploring large data sets.

Appeared in

Advances in Distributed Data Mining, Eds. Hillol Kargupta and Philip Chan, AAAI Press, Spring 2000.

Look at the Paper (ps.gz, pdf.gz)