A three-phase workflow for general and expressive representations of nondeterminism in HPC applications
- Univ. of Tennessee, Knoxville, TN (United States); Univ. of Delaware, Newark, DE (United States)
- Univ. of Tennessee, Knoxville, TN (United States)
- RIKEN Center for Computational Science, Tokyo (Japan)
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Nondeterminism is an increasingly entrenched property of high-performance computing (HPC) applications and has recently been shown to seriously hamper debugging and reproducibility efforts. Additionally, tools for addressing the nondeterministic debugging problem have emerged, but they do not provide methods for systematically cataloging the nondeterminism in a given application. We propose a three-phase workflow for representing executions of nondeterministic message passing interface programs as event graphs, quantifying their structural similarity with graph kernels, and applying machine learning techniques to investigate shared properties across applications. We present an empirical study comparing two graph kernels’ suitability for this task and propose future uses of the methodology.
- Research Organization:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- AC52-07NA27344
- OSTI ID:
- 1809185
- Report Number(s):
- LLNL-JRNL-819205; 1030102
- Journal Information:
- International Journal of High Performance Computing Applications, Vol. 33, Issue 6; ISSN 1094-3420
- Publisher:
- SAGECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Enabling HPC Scientific Workflows for Serverless
X-composer: enabling cross-environments in-situ workflows between HPC and cloud