A Heterogeneity-Aware Task Scheduler for Spark
- Virginia Tech, Blacksburg, VA
- ORNL
Big data processing systems such as Spark are employed in an increasing number of diverse applications—such as machine learning, graph computation, and scientific computing—each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements.In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1471879
- Resource Relation:
- Conference: IEEE Intenrational Conference on Cluster Computing - Belfast, , United Kingdom - 9/10/2018 8:00:00 AM-9/13/2018 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Improving Spark performance with MPTE in heterogeneous environments
|
conference | July 2016 |
Bubble-flux
|
conference | June 2013 |
Apache Hadoop YARN: yet another resource negotiator
|
conference | January 2013 |
A Dynamic MapReduce Scheduler for Heterogeneous Workloads
|
conference | August 2009 |
Data warehousing and analytics infrastructure at facebook
|
conference | June 2010 |
Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters
|
conference | June 2017 |
Scaling up data-parallel analytics platforms: Linear algebraic operation cases
|
conference | December 2017 |
ActCap: Accelerating MapReduce on heterogeneous clusters with capability-aware data placement
|
conference | April 2015 |
Profiling and evaluating hardware choices for MapReduce environments: An application-aware approach
|
journal | September 2014 |
Aria
|
conference | June 2011 |
MARLA: MapReduce for Heterogeneous Clusters
|
conference | May 2012 |
Jockey
|
conference | April 2012 |
Choosy
|
conference | April 2013 |
Multi-resource packing for cluster schedulers
|
conference | August 2014 |
Scheduling Parallel Machines On-Line
|
journal | December 1995 |
Solving resource-constrained project scheduling problems with bi-criteria heuristic search techniques
|
journal | June 2003 |
HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers
|
conference | July 2013 |
Tarazu
|
conference | March 2012 |
Tempo
|
journal | June 2016 |
Dynamic Placement of Virtual Machines for Managing SLA Violations
|
conference | May 2007 |
Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning
|
journal | March 2017 |
Quasar
|
conference | February 2014 |
D-factor
|
journal | June 2012 |
TRACES: Generating Twitter stories via shared subspace and temporal smoothness
|
conference | December 2017 |
Sparrow: distributed, low latency scheduling
|
conference | November 2013 |
Monotasks
|
conference | October 2017 |
Trendi: Tracking stories in news and microblogs via emerging, evolving and fading topics
|
conference | December 2017 |
Modeling and synthesizing task placement constraints in Google compute clusters
|
conference | October 2011 |
Spatiotemporal Event Forecasting from Incomplete Hyper-local Price Data
|
conference | November 2017 |
Similar Records
NUMA-Aware Thread Scheduling for Big Data Transfers over Terabits Network Infrastructure
Locality-aware and load-balanced static task scheduling for MapReduce