Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Characterizing the Performance of Executing Many-tasks on Summit

Conference ·
OSTI ID:1657902
Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Office of Science (SC); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1657902
Country of Publication:
United States
Language:
English

Similar Records

RADICAL-Pilot and PMIx/PRRTE: Executing Heterogeneous Workloads at Large Scale on Partitioned HPC Resources
Conference · Sat Dec 31 23:00:00 EST 2022 · OSTI ID:1999093

$\mathrm{RADICAL}$-Pilot and $\mathrm{PMIx}$/$\mathrm{PRRTE}$: Executing Heterogeneous Workloads at Large Scale on Partitioned $\mathrm{HPC}$ Resources
Journal Article · Wed Jan 11 19:00:00 EST 2023 · Lecture Notes in Computer Science · OSTI ID:1963184

Design and Performance Characterization of RADICAL-Pilot on Leadership-Class Platforms
Journal Article · Thu Mar 31 20:00:00 EDT 2022 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1830194

Related Subjects