Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
HaLoop: Efficient Iterative Data Processing on Large Clusters
 

Summary: HaLoop: Efficient Iterative Data Processing
on Large Clusters
Yingyi Bu
Bill Howe Magdalena Balazinska Michael D. Ernst
Department of Computer Science and Engineering
University of Washington, Seattle, WA, U.S.A.
yingyib@ics.uci.edu, {billhowe, magda, mernst}@cs.washington.edu
ABSTRACT
The growing demand for large-scale data mining and data anal-
ysis applications has led both industry and academia to design
new types of highly scalable data-intensive computing platforms.
MapReduce and Dryad are two popular platforms in which the
dataflow takes the form of a directed acyclic graph of operators.
These platforms lack built-in support for iterative programs, which
arise naturally in many applications including data mining, web
ranking, graph analysis, model fitting, and so on. This paper
presents HaLoop, a modified version of the Hadoop MapReduce
framework that is designed to serve these applications. HaLoop
not only extends MapReduce with programming support for it-
erative applications, it also dramatically improves their efficiency

  

Source: Anderson, Richard - Department of Computer Science and Engineering, University of Washington at Seattle
Ernst, Michael - Department of Computer Science and Engineering, University of Washington at Seattle

 

Collections: Computer Technologies and Information Sciences