skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Heterogeneity-Aware Task Scheduler for Spark

Abstract

Big data processing systems such as Spark are employed in an increasing number of diverse applications—such as machine learning, graph computation, and scientific computing—each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements.In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.

Authors:
 [1];  [1]; ORCiD logo [2]; ORCiD logo [2]
  1. Virginia Tech, Blacksburg, VA
  2. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1471879
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE Intenrational Conference on Cluster Computing - Belfast, , United Kingdom - 9/10/2018 8:00:00 AM-9/13/2018 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Xu, Luna, Butt, Ali R., Lim, Seung-Hwan, and Kannan, Ramakrishnan. A Heterogeneity-Aware Task Scheduler for Spark. United States: N. p., 2018. Web.
Xu, Luna, Butt, Ali R., Lim, Seung-Hwan, & Kannan, Ramakrishnan. A Heterogeneity-Aware Task Scheduler for Spark. United States.
Xu, Luna, Butt, Ali R., Lim, Seung-Hwan, and Kannan, Ramakrishnan. Sat . "A Heterogeneity-Aware Task Scheduler for Spark". United States. https://www.osti.gov/servlets/purl/1471879.
@article{osti_1471879,
title = {A Heterogeneity-Aware Task Scheduler for Spark},
author = {Xu, Luna and Butt, Ali R. and Lim, Seung-Hwan and Kannan, Ramakrishnan},
abstractNote = {Big data processing systems such as Spark are employed in an increasing number of diverse applications—such as machine learning, graph computation, and scientific computing—each with dynamic and different resource needs. These applications increasingly run on heterogeneous hardware, e.g., with out-of-core accelerators. However, big data platforms do not factor in the multi-dimensional heterogeneity of applications and hardware. This leads to a fundamental mismatch between the application and hardware characteristics, and the resource scheduling adopted in big data platforms. For example, Hadoop and Spark consider only data locality when assigning tasks to nodes, and typically disregard the hardware capabilities and suitability to specific application requirements.In this paper, we present RUPAM, a heterogeneity-aware task scheduling system for big data platforms, which considers both task-level resource characteristics and underlying hardware characteristics, as well as preserves data locality. RUPAM adopts a simple yet effective heuristic to decide the dominant scheduling factor (e.g., CPU, memory, or I/O), given a task in a particular stage. Our experiments show that RUPAM is able to improve the performance of representative applications by up to 62.3% compared to the standard Spark scheduler.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {9}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: