| | |
Summary: Compiler and Runtime Support for Shared Memory
Parallelization of Data Mining Algorithms
Xiaogang Li Ruoming Jin Gagan Agrawal
Department of Computer and Information Sciences
Ohio State University, Columbus OH 43210¡xgli,jinr,agrawal¢@cis.ohio-state.edu
Abstract. Data mining techniques focus on finding novel and useful patterns or
models from large datasets. Because of the volume of the data to be analyzed, the
amount of computation involved, and the need for rapid or even interactive anal-
ysis, data mining applications require the use of parallel machines. We have been
developing compiler and runtime support for developing scalable implementa-
tions of data mining algorithms. Our work encompasses shared memory paral-
lelization, distributed memory parallelization, and optimizations for processing
disk-resident datasets.
In this paper, we focus on compiler and runtime support for shared memory par-
allelization of data mining algorithms. We have developed a set of parallelization
techniques that apply across algorithms for a variety of mining tasks. We describe
the interface of the middleware where these techniques are implemented. Then,
we present compiler techniques for translating data parallel code to the middle-
ware specification. Finally, we present a brief evaluation of our compiler using
|