skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Three practical workflow schedulers for easy maximum parallelism

Journal Article · · Software, Practice and Experience
DOI:https://doi.org/10.1002/spe.3047· OSTI ID:1843707

Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC because they allow full system utilization with relaxed synchronization requirements. There are so many special-purpose tools for task scheduling, one might wonder why more are needed. Use cases seen on the Summit supercomputer needed better integration with MPI and greater flexibility in job launch configurations. Preparation, execution, and analysis of computational chemistry simulations at the scale of tens of thousands of processors revealed three distinct workflow patterns. A separate job scheduler was implemented for each one using extremely simple and robust designs: file-based, task-list based, and bulk-synchronous. Comparing to existing methods shows unique benefits of this work, including simplicity of design, suitability for HPC centers, short startup time, and well-understood per-task overhead. All three new tools have been shown to scale to full utilization of Summit, and have been made publicly available with tests and documentation. This work presents a complete characterization of the minimum effective task granularity for efficient scheduler usage scenarios. Here, these schedulers have the same bottlenecks, and hence similar task granularities as those reported for existing tools following comparable paradigms.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1843707
Journal Information:
Software, Practice and Experience, Vol. 53, Issue 1; ISSN 0038-0644
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (18)

Supercomputing Pipelines Search for Therapeutics Against COVID-19 journal January 2020
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance conference November 2020
A characterization of workflow management systems for extreme-scale applications journal October 2017
Simple data and workflow management with the signac framework journal April 2018
Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19 journal December 2020
Sustainable data analysis with Snakemake journal January 2021
Using Pilot Systems to Execute Many Task Workloads on Supercomputers book January 2019
Pegasus, a workflow management system for science automation journal May 2015
Regent: a high-productivity programming language for HPC with logical regions
  • Slaughter, Elliott; Lee, Wonchan; Treichler, Sean
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807629
conference January 2015
Massively parallel loading
  • Frings, Wolfgang; Ahn, Dong H.; LeGendre, Matthew
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465020
conference January 2013
Statistics of Extremes book January 1958
DFT-FE – A massively parallel adaptive finite-element code for large-scale density functional theory calculations journal January 2020
FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications journal May 2015
Parallel distributed computing using Python journal September 2011
Apache Spark: a unified engine for big data processing journal October 2016
HPX - The C++ Standard Library for Parallelism and Concurrency journal September 2020
Accelerating Large-Scale Excited-State GW Calculations on Leadership HPC Systems conference November 2020
Dynamic task discovery in PaRSEC: a data-flow task-based runtime
  • Hoque, Reazul; Herault, Thomas; Bosilca, George
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems https://doi.org/10.1145/3148226.3148233
conference November 2017