Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica
 

Summary: Improving MapReduce Performance in Heterogeneous Environments
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica
University of California, Berkeley
{matei,andyk,adj,randy,stoica}@cs.berkeley.edu
Abstract
MapReduce is emerging as an important programming
model for large-scale data-parallel applications such as
web indexing, data mining, and scientific simulation.
Hadoop is an open-source implementation of MapRe-
duce enjoying wide adoption and is often used for short
jobs where low response time is critical. Hadoop's per-
formance is closely tied to its task scheduler, which im-
plicitly assumes that cluster nodes are homogeneous and
tasks make progress linearly, and uses these assumptions
to decide when to speculatively re-execute tasks that ap-
pear to be stragglers. In practice, the homogeneity as-
sumptions do not always hold. An especially compelling
setting where this occurs is a virtualized data center, such
as Amazon's Elastic Compute Cloud (EC2). We show
that Hadoop's scheduler can cause severe performance

  

Source: Akella, Aditya - Department of Computer Sciences, University of Wisconsin at Madison

 

Collections: Computer Technologies and Information Sciences