Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne
The mission of the DOE Argonne Leadership Computing Facility (ALCF) is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community. The ALCF operates supercomputers that are generally amongst the Top 5 fastest machines in the world. Specifically, ALCF is looking for the science that is either too big to run anywhere else, or it would take so long as to be impractical (i.e., “capability jobs”). At ALCF, batch scheduling plays a critical role for achieving a set of site goals within a set of constraints. While system utilization is an important goal at ALCF, its largest mission constraint is to enable extreme scale parallel jobs to take precedence. In this paper, we will describe the specific scheduling goals and constraints, analyze the workload traces collected in 2013–2017 from the 48-rack petascale supercomputer Mira, and discuss the upcoming scheduling challenges at ALCF.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- National Science Foundation (NSF); USDOE Office of Science - Office of Basic Energy Sciences - Scientific User Facilities Division
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1481853
- Resource Relation:
- Conference: 21st Workshop on Job Scheduling Strategies for Parallel Processing held in conjunction with IPDPS 2017, 06/02/17 - 06/02/17, Orlando, FL, US
- Country of Publication:
- United States
- Language:
- English
Co-analysis of RAS Log and Job Log on Blue Gene/P
|
conference | May 2011 |
Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems
|
conference | January 2013 |
A Data Driven Scheduling Approach for Power Management on HPC Systems
|
conference | November 2016 |
Improving Batch Scheduling on Blue Gene/Q by Relaxing 5D Torus Network Allocation Constraints
|
conference | May 2015 |
I/O-Aware Batch Scheduling for Petascale Computing Systems
|
conference | September 2015 |
Similar Records
The Argonne Leadership Computing Facility 2010 annual report.
Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science