| | |
Summary: HaLoop: Efficient Iterative Data Processing
on Large Clusters
Yingyi Bu
Bill Howe Magdalena Balazinska Michael D. Ernst
Department of Computer Science and Engineering
University of Washington, Seattle, WA, U.S.A.
yingyib@ics.uci.edu, {billhowe, magda, mernst}@cs.washington.edu
ABSTRACT
The growing demand for large-scale data mining and data anal-
ysis applications has led both industry and academia to design
new types of highly scalable data-intensive computing platforms.
MapReduce and Dryad are two popular platforms in which the
dataflow takes the form of a directed acyclic graph of operators.
These platforms lack built-in support for iterative programs, which
arise naturally in many applications including data mining, web
ranking, graph analysis, model fitting, and so on. This paper
presents HaLoop, a modified version of the Hadoop MapReduce
framework that is designed to serve these applications. HaLoop
not only extends MapReduce with programming support for it-
erative applications, it also dramatically improves their efficiency
|