skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne

Abstract

The mission of the DOE Argonne Leadership Computing Facility (ALCF) is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community. The ALCF operates supercomputers that are generally amongst the Top 5 fastest machines in the world. Specifically, ALCF is looking for the science that is either too big to run anywhere else, or it would take so long as to be impractical (i.e., “capability jobs”). At ALCF, batch scheduling plays a critical role for achieving a set of site goals within a set of constraints. While system utilization is an important goal at ALCF, its largest mission constraint is to enable extreme scale parallel jobs to take precedence. In this paper, we will describe the specific scheduling goals and constraints, analyze the workload traces collected in 2013–2017 from the 48-rack petascale supercomputer Mira, and discuss the upcoming scheduling challenges at ALCF.

Authors:
; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
National Science Foundation (NSF); USDOE Office of Science - Office of Basic Energy Sciences - Scientific User Facilities Division
OSTI Identifier:
1481853
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 21st Workshop on Job Scheduling Strategies for Parallel Processing held in conjunction with IPDPS 2017, 06/02/17 - 06/02/17, Orlando, FL, US
Country of Publication:
United States
Language:
English

Citation Formats

Allcock, William, Rich, Paul, Fan, Yuping, and Lan, Zhiling. Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne. United States: N. p., 2018. Web. doi:10.1007/978-3-319-77398-8_1.
Allcock, William, Rich, Paul, Fan, Yuping, & Lan, Zhiling. Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne. United States. doi:10.1007/978-3-319-77398-8_1.
Allcock, William, Rich, Paul, Fan, Yuping, and Lan, Zhiling. Wed . "Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne". United States. doi:10.1007/978-3-319-77398-8_1.
@article{osti_1481853,
title = {Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne},
author = {Allcock, William and Rich, Paul and Fan, Yuping and Lan, Zhiling},
abstractNote = {The mission of the DOE Argonne Leadership Computing Facility (ALCF) is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community. The ALCF operates supercomputers that are generally amongst the Top 5 fastest machines in the world. Specifically, ALCF is looking for the science that is either too big to run anywhere else, or it would take so long as to be impractical (i.e., “capability jobs”). At ALCF, batch scheduling plays a critical role for achieving a set of site goals within a set of constraints. While system utilization is an important goal at ALCF, its largest mission constraint is to enable extreme scale parallel jobs to take precedence. In this paper, we will describe the specific scheduling goals and constraints, analyze the workload traces collected in 2013–2017 from the 48-rack petascale supercomputer Mira, and discuss the upcoming scheduling challenges at ALCF.},
doi = {10.1007/978-3-319-77398-8_1},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {2}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: