The Ghost of Performance Reproducibility Past
- University of Oregon, Eugene, OR (United States)
- Brookhaven National Laboratory (BNL), Upton, NY (United States)
- Rutgers University, New Brunswick, NJ (United States); Brookhaven National Laboratory (BNL), Upton, NY (United States)
The importance of ensemble computing is well established. However, executing ensembles at scale introduces interesting performance fluctuations that have not been well investigated. In this paper, we trace our experience uncovering performance fluctuations of ensemble applications (primarily constituting a workflow of GROMACS tasks), and unsuccessful attempts, so far, at trying to discern the underlying cause(s) of performance fluctuations. Is the failure to discern the causative or contributing factors a failure of capability? Or imagination? Do the fluctuations have their genesis in some inscrutable aspect of the system or software? Does it warrant a fundamental reassessment and rethinking of how we assume and conceptualize performance reproducibility? Answers to these questions are not straightforward, nor are they immediate or obvious. We conclude with a discussion about the performance of ensemble applications and ruminate over the implications for how we define and measure application performance.
- Research Organization:
- Brookhaven National Laboratory (BNL), Upton, NY (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research
- Grant/Contract Number:
- SC0012704
- OSTI ID:
- 1963183
- Report Number(s):
- BNL-224122-2023-JAAM
- Journal Information:
- Proceedings - IEEE International Conference on eScience (Online), Journal Name: Proceedings - IEEE International Conference on eScience (Online) Vol. 2022; ISSN 2325-372X
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Benchmarking the effects of operating system interference on extreme-scale parallel machines
|
journal | January 2008 |
Adaptive Ensemble Biomolecular Applications at Scale
|
journal | March 2020 |
Adaptive ensemble simulations of biomolecules
|
journal | October 2018 |
RAPTOR: Ravenous Throughput Computing
|
conference | May 2022 |
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
|
conference | May 2018 |
Coupling streaming AI and HPC ensembles to achieve 100–1000× faster biomolecular simulations
|
conference | May 2022 |
Incorporating Scientific Workflows in Computing Research Processes
|
journal | July 2019 |
There goes the neighborhood: performance degradation due to nearby jobs
|
conference | January 2013 |
Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing
|
conference | November 2015 |
Run-to-run variability on Xeon Phi based cray XC systems
|
conference | November 2017 |
Scalable HPC & AI infrastructure for COVID-19 therapeutics
|
conference | July 2021 |
The Tau Parallel Performance System
|
journal | May 2006 |
Computational reproducibility of scientific workflows at extreme scales
|
journal | April 2019 |
Similar Records
A lightweight method for evaluating in situ workflow efficiency
Ensemble Simulations on Leadership Computing Systems