skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems

Abstract

Deep learning (DL) has become a key tool for solving complex scientific problems. However, managing the multi-dimensional large-scale data associated with DL, especially atop extant multiple graphics processing units (GPUs) in modern supercomputers poses significant challenges. Moreover, the latest high-performance computing (HPC) architectures bring different performance trends in training throughput compared to the existing studies. Existing DL optimizations such as larger batch size and GPU locality-aware scheduling have little effect on improving DL training throughput performance due to fast CPU-to-GPU connections. Additionally, DL training on multiple GPUs scales sublinearly. Thus, simply adding more GPUs to a system is ineffective. To this end, we design MARBLE, a first-of-its-kind job scheduler, which considers the non-linear scalability of GPUs at the intra-node level to schedule an appropriate number of GPUs per node for a job. By sharing the GPU resources on a node with multiple DL jobs, MARBLE avoids low GPU utilization in current multi-GPU DL training on HPC systems. Our comprehensive evaluation in the Summit supercomputer shows that MARBLE is able to improve DL training performance by up to 48.3% compared to the popular Platform Load Sharing Facility (LSF) scheduler. Compared to the state-of-the-art of DL scheduler, Optimus, MARBLE reduces the jobmore » completion time by up to 47%.« less

Authors:
 [1];  [2];  [1];  [1]; ORCiD logo [3]; ORCiD logo [3]
  1. Virginia Tech, Blacksburg, VA
  2. Rochester Institute of Technology, Rochester, NY
  3. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1649080
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: The 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) - Melbourne, , Australia - 5/11/2020 12:00:00 PM-5/14/2020 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Han, Jingoo, Rafique, Mustafa, Xu, Luna, Butt, Ali R., Lim, Seung-Hwan, and Vazhkudai, Sudharshan. MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems. United States: N. p., 2020. Web. doi:10.1109/CCGrid49817.2020.00-66.
Han, Jingoo, Rafique, Mustafa, Xu, Luna, Butt, Ali R., Lim, Seung-Hwan, & Vazhkudai, Sudharshan. MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems. United States. doi:10.1109/CCGrid49817.2020.00-66.
Han, Jingoo, Rafique, Mustafa, Xu, Luna, Butt, Ali R., Lim, Seung-Hwan, and Vazhkudai, Sudharshan. Fri . "MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems". United States. doi:10.1109/CCGrid49817.2020.00-66. https://www.osti.gov/servlets/purl/1649080.
@article{osti_1649080,
title = {MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems},
author = {Han, Jingoo and Rafique, Mustafa and Xu, Luna and Butt, Ali R. and Lim, Seung-Hwan and Vazhkudai, Sudharshan},
abstractNote = {Deep learning (DL) has become a key tool for solving complex scientific problems. However, managing the multi-dimensional large-scale data associated with DL, especially atop extant multiple graphics processing units (GPUs) in modern supercomputers poses significant challenges. Moreover, the latest high-performance computing (HPC) architectures bring different performance trends in training throughput compared to the existing studies. Existing DL optimizations such as larger batch size and GPU locality-aware scheduling have little effect on improving DL training throughput performance due to fast CPU-to-GPU connections. Additionally, DL training on multiple GPUs scales sublinearly. Thus, simply adding more GPUs to a system is ineffective. To this end, we design MARBLE, a first-of-its-kind job scheduler, which considers the non-linear scalability of GPUs at the intra-node level to schedule an appropriate number of GPUs per node for a job. By sharing the GPU resources on a node with multiple DL jobs, MARBLE avoids low GPU utilization in current multi-GPU DL training on HPC systems. Our comprehensive evaluation in the Summit supercomputer shows that MARBLE is able to improve DL training performance by up to 48.3% compared to the popular Platform Load Sharing Facility (LSF) scheduler. Compared to the state-of-the-art of DL scheduler, Optimus, MARBLE reduces the job completion time by up to 47%.},
doi = {10.1109/CCGrid49817.2020.00-66},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: