skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Heterogeneity-Aware Task Scheduler for Spark

Conference ·

Big data processing systems such as Spark are employed in an increasing number of diverse applications—such as machine learning, graph computation, and scientific computing—each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements.In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1471879
Resource Relation:
Conference: IEEE Intenrational Conference on Cluster Computing - Belfast, , United Kingdom - 9/10/2018 8:00:00 AM-9/13/2018 4:00:00 AM
Country of Publication:
United States
Language:
English

References (29)

Improving Spark performance with MPTE in heterogeneous environments conference July 2016
Bubble-flux conference June 2013
Apache Hadoop YARN: yet another resource negotiator conference January 2013
A Dynamic MapReduce Scheduler for Heterogeneous Workloads conference August 2009
Data warehousing and analytics infrastructure at facebook conference June 2010
Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters conference June 2017
Scaling up data-parallel analytics platforms: Linear algebraic operation cases conference December 2017
ActCap: Accelerating MapReduce on heterogeneous clusters with capability-aware data placement conference April 2015
Profiling and evaluating hardware choices for MapReduce environments: An application-aware approach journal September 2014
Aria conference June 2011
MARLA: MapReduce for Heterogeneous Clusters conference May 2012
Jockey conference April 2012
Choosy conference April 2013
Multi-resource packing for cluster schedulers conference August 2014
Scheduling Parallel Machines On-Line journal December 1995
Solving resource-constrained project scheduling problems with bi-criteria heuristic search techniques journal June 2003
HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers conference July 2013
Tarazu
  • Ahmad, Faraz; Chakradhar, Srimat T.; Raghunathan, Anand
  • Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/2150976.2150984
conference March 2012
Tempo journal June 2016
Dynamic Placement of Virtual Machines for Managing SLA Violations conference May 2007
Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning journal March 2017
Quasar
  • Delimitrou, Christina; Kozyrakis, Christos
  • Proceedings of the 19th international conference on Architectural support for programming languages and operating systems https://doi.org/10.1145/2541940.2541941
conference February 2014
D-factor journal June 2012
TRACES: Generating Twitter stories via shared subspace and temporal smoothness conference December 2017
Sparrow: distributed, low latency scheduling
  • Ousterhout, Kay; Wendell, Patrick; Zaharia, Matei
  • SOSP '13: ACM SIGOPS 24th Symposium on Operating Systems Principles, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles https://doi.org/10.1145/2517349.2522716
conference November 2013
Monotasks conference October 2017
Trendi: Tracking stories in news and microblogs via emerging, evolving and fading topics conference December 2017
Modeling and synthesizing task placement constraints in Google compute clusters conference October 2011
Spatiotemporal Event Forecasting from Incomplete Hyper-local Price Data conference November 2017

Similar Records

A Heterogeneity-Aware Task Scheduler for Spark
Conference · Sat Sep 01 00:00:00 EDT 2018 · 2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) · OSTI ID:1471879

NUMA-Aware Thread Scheduling for Big Data Transfers over Terabits Network Infrastructure
Journal Article · Mon May 07 00:00:00 EDT 2018 · Scientific Programming · OSTI ID:1471879

Locality-aware and load-balanced static task scheduling for MapReduce
Journal Article · Fri Jul 27 00:00:00 EDT 2018 · Future Generations Computer Systems · OSTI ID:1471879

Related Subjects