skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne

Conference ·

The mission of the DOE Argonne Leadership Computing Facility (ALCF) is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community. The ALCF operates supercomputers that are generally amongst the Top 5 fastest machines in the world. Specifically, ALCF is looking for the science that is either too big to run anywhere else, or it would take so long as to be impractical (i.e., “capability jobs”). At ALCF, batch scheduling plays a critical role for achieving a set of site goals within a set of constraints. While system utilization is an important goal at ALCF, its largest mission constraint is to enable extreme scale parallel jobs to take precedence. In this paper, we will describe the specific scheduling goals and constraints, analyze the workload traces collected in 2013–2017 from the 48-rack petascale supercomputer Mira, and discuss the upcoming scheduling challenges at ALCF.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Office of Science - Office of Basic Energy Sciences - Scientific User Facilities Division
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1481853
Resource Relation:
Conference: 21st Workshop on Job Scheduling Strategies for Parallel Processing held in conjunction with IPDPS 2017, 06/02/17 - 06/02/17, Orlando, FL, US
Country of Publication:
United States
Language:
English

References (5)

Co-analysis of RAS Log and Job Log on Blue Gene/P conference May 2011
Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems
  • Yang, Xu; Zhou, Zhou; Wallace, Sean
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503264
conference January 2013
A Data Driven Scheduling Approach for Power Management on HPC Systems
  • Wallace, Sean; Yang, Xu; Vishwanath, Venkatram
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.55
conference November 2016
Improving Batch Scheduling on Blue Gene/Q by Relaxing 5D Torus Network Allocation Constraints conference May 2015
I/O-Aware Batch Scheduling for Petascale Computing Systems conference September 2015

Similar Records

Argonne Leadership Computing Facility 2011 annual report : Shaping future supercomputing.
Technical Report · Thu Aug 16 00:00:00 EDT 2012 · OSTI ID:1481853

The Argonne Leadership Computing Facility 2010 annual report.
Technical Report · Mon May 09 00:00:00 EDT 2011 · OSTI ID:1481853

Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science
Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1481853

Related Subjects