Scalable Data Mining through Fine-Grained Parallelism: the Present and
the Future
Chandrika Kamath, Ron Musick
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561, Livermore, CA 94551
{kamath2, rmusick}@llnl.gov
Abstract
Many organizations, both scientific and commercial, are routinely gathering
data at an ever increasing pace. In order to make full use of the
information in this data, it is becoming clear that we need data mining
techniques that are scalable, that is, techniques that can use additional
computational resources in an effective manner to solve increasingly larger
problems. In this Chapter, we review ways in which fine-grained
parallelism, using tightly-coupled multiple processors, can be used to
build accurate models quickly. Focusing on both the data preparation and
the pattern recognition steps, we survey the current state-of-the-art in
this rapidly evolving field of parallel data mining, and explore ways in
which we can benefit from developments in other related areas. Looking to
the future, we discuss areas where additional research is needed to make
parallel, fine-grained data mining a viable means of exploring large data
sets.
Appeared in
Advances in Distributed Data Mining, Eds. Hillol Kargupta and Philip
Chan, AAAI Press, Spring 2000.