| | |
Summary: Estimating the Progress of MapReduce Pipelines
Kristi Morton, Abram Friesen, Magdalena Balazinska, Dan Grossman
Computer Science and Engineering Department, University of Washington
Seattle, Washington, USA
{kmorton,afriesen,magda,djg}@cs.washington.edu
Abstract-- In parallel query-processing environments, accu-
rate, time-oriented progress indicators could provide much utility
given that inter- and intra-query execution times can have high
variance. However, none of the techniques used by existing tools
or available in the literature provide non-trivial progress estima-
tion for parallel queries. In this paper, we introduce Parallax,
the first such indicator. While several parallel data processing
systems exist, the work in this paper targets environments where
queries consist of a series of MapReduce jobs. Parallax builds
on recently-developed techniques for estimating the progress of
single-site SQL queries, but focuses on the challenges related to
parallelism and variable execution speeds. We have implemented
our estimator in the Pig system and demonstrate its performance
through experiments with the PigMix benchmark and other
queries running in a real, small-scale cluster.
|