Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Estimating the Progress of MapReduce Pipelines Kristi Morton, Abram Friesen, Magdalena Balazinska, Dan Grossman
 

Summary: Estimating the Progress of MapReduce Pipelines
Kristi Morton, Abram Friesen, Magdalena Balazinska, Dan Grossman
Computer Science and Engineering Department, University of Washington
Seattle, Washington, USA
{kmorton,afriesen,magda,djg}@cs.washington.edu
Abstract-- In parallel query-processing environments, accu-
rate, time-oriented progress indicators could provide much utility
given that inter- and intra-query execution times can have high
variance. However, none of the techniques used by existing tools
or available in the literature provide non-trivial progress estima-
tion for parallel queries. In this paper, we introduce Parallax,
the first such indicator. While several parallel data processing
systems exist, the work in this paper targets environments where
queries consist of a series of MapReduce jobs. Parallax builds
on recently-developed techniques for estimating the progress of
single-site SQL queries, but focuses on the challenges related to
parallelism and variable execution speeds. We have implemented
our estimator in the Pig system and demonstrate its performance
through experiments with the PigMix benchmark and other
queries running in a real, small-scale cluster.

  

Source: Anderson, Richard - Department of Computer Science and Engineering, University of Washington at Seattle
Balazinska, Magdalena - Department of Computer Science and Engineering, University of Washington at Seattle

 

Collections: Computer Technologies and Information Sciences