Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne

Conference ·
The mission of the DOE Argonne Leadership Computing Facility (ALCF) is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community. The ALCF operates supercomputers that are generally amongst the Top 5 fastest machines in the world. Specifically, ALCF is looking for the science that is either too big to run anywhere else, or it would take so long as to be impractical (i.e., “capability jobs”). At ALCF, batch scheduling plays a critical role for achieving a set of site goals within a set of constraints. While system utilization is an important goal at ALCF, its largest mission constraint is to enable extreme scale parallel jobs to take precedence. In this paper, we will describe the specific scheduling goals and constraints, analyze the workload traces collected in 2013–2017 from the 48-rack petascale supercomputer Mira, and discuss the upcoming scheduling challenges at ALCF.
Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Office of Science - Office of Basic Energy Sciences - Scientific User Facilities Division
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1481853
Country of Publication:
United States
Language:
English

References (5)

Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems
  • Yang, Xu; Zhou, Zhou; Wallace, Sean
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503264
conference January 2013
A Data Driven Scheduling Approach for Power Management on HPC Systems
  • Wallace, Sean; Yang, Xu; Vishwanath, Venkatram
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.55
conference November 2016
Co-analysis of RAS Log and Job Log on Blue Gene/P conference May 2011
Improving Batch Scheduling on Blue Gene/Q by Relaxing 5D Torus Network Allocation Constraints conference May 2015
I/O-Aware Batch Scheduling for Petascale Computing Systems conference September 2015

Similar Records

Argonne Leadership Computing Facility 2011 annual report: Shaping future supercomputing
Technical Report · Wed Aug 15 20:00:00 EDT 2012 · OSTI ID:1049042

The Argonne Leadership Computing Facility 2010 annual report.
Technical Report · Mon May 09 00:00:00 EDT 2011 · OSTI ID:1014009

Introducing Mira, Argonne's Next-Generation Supercomputer
Multimedia · Tue Mar 19 00:00:00 EDT 2013 · OSTI ID:1082687

Related Subjects