Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
A Study of Skew in MapReduce Applications YongChul Kwon, Magdalena Balazinska, Bill Howe
 

Summary: A Study of Skew in MapReduce Applications
YongChul Kwon, Magdalena Balazinska, Bill Howe
University of Washington, USA
Email:{yongchul,magda,billhowe}@cs.washington.edu
Jerome Rolia
HP Labs
Email: jerry.rolia@hp.com
Abstract--This paper presents a study of skew -- highly vari-
able task runtimes -- in MapReduce applications. We describe
various causes and manifestations of skew as observed in real
world Hadoop applications. Runtime task distributions from
these applications demonstrate the presence and negative impact
of skew on performance behavior. We discuss best practices
recommended for avoiding such behavior and their limitations.
I. INTRODUCTION
MapReduce [1] has proven itself as a powerful and cost-
effective approach for massively parallel analytics [2]. A
MapReduce job runs in two main phases: map phase and
reduce phase. In each phase, a subset of the input data is
processed by distributed tasks in a cluster of computers. When

  

Source: Anderson, Richard - Department of Computer Science and Engineering, University of Washington at Seattle
Balazinska, Magdalena - Department of Computer Science and Engineering, University of Washington at Seattle

 

Collections: Computer Technologies and Information Sciences