Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

The Ghost of Performance Reproducibility Past

Journal Article · · Proceedings - IEEE International Conference on eScience (Online)
 [1];  [2];  [3];  [3];  [1]
  1. University of Oregon, Eugene, OR (United States)
  2. Brookhaven National Laboratory (BNL), Upton, NY (United States)
  3. Rutgers University, New Brunswick, NJ (United States); Brookhaven National Laboratory (BNL), Upton, NY (United States)

The importance of ensemble computing is well established. However, executing ensembles at scale introduces interesting performance fluctuations that have not been well investigated. In this paper, we trace our experience uncovering performance fluctuations of ensemble applications (primarily constituting a workflow of GROMACS tasks), and unsuccessful attempts, so far, at trying to discern the underlying cause(s) of performance fluctuations. Is the failure to discern the causative or contributing factors a failure of capability? Or imagination? Do the fluctuations have their genesis in some inscrutable aspect of the system or software? Does it warrant a fundamental reassessment and rethinking of how we assume and conceptualize performance reproducibility? Answers to these questions are not straightforward, nor are they immediate or obvious. We conclude with a discussion about the performance of ensemble applications and ruminate over the implications for how we define and measure application performance.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research
Grant/Contract Number:
SC0012704
OSTI ID:
1963183
Report Number(s):
BNL-224122-2023-JAAM
Journal Information:
Proceedings - IEEE International Conference on eScience (Online), Journal Name: Proceedings - IEEE International Conference on eScience (Online) Vol. 2022; ISSN 2325-372X
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (13)

Benchmarking the effects of operating system interference on extreme-scale parallel machines journal January 2008
Adaptive Ensemble Biomolecular Applications at Scale journal March 2020
Adaptive ensemble simulations of biomolecules journal October 2018
RAPTOR: Ravenous Throughput Computing conference May 2022
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications conference May 2018
Coupling streaming AI and HPC ensembles to achieve 100–1000× faster biomolecular simulations conference May 2022
Incorporating Scientific Workflows in Computing Research Processes journal July 2019
There goes the neighborhood: performance degradation due to nearby jobs
  • Bhatele, Abhinav; Mohror, Kathryn; Langer, Steven H.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503247
conference January 2013
Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing
  • Inadomi, Yuichi; Patki, Tapasya; Inoue, Koji
  • SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2807591.2807638
conference November 2015
Run-to-run variability on Xeon Phi based cray XC systems
  • Chunduri, Sudheer; Harms, Kevin; Parker, Scott
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126926
conference November 2017
Scalable HPC & AI infrastructure for COVID-19 therapeutics
  • Lee, Hyungro; Merzky, Andre; Tan, Li
  • PASC '21: Platform for Advanced Scientific Computing Conference, Proceedings of the Platform for Advanced Scientific Computing Conference https://doi.org/10.1145/3468267.3470573
conference July 2021
The Tau Parallel Performance System journal May 2006
Computational reproducibility of scientific workflows at extreme scales journal April 2019

Similar Records

Workflows for Science: A comprehensive guide for ensemble workflow tools usage with applications on OLCF systems
Technical Report · 2025 · OSTI ID:2575304

A lightweight method for evaluating in situ workflow efficiency
Journal Article · 2020 · Journal of Computational Science · OSTI ID:1787006

Ensemble Simulations on Leadership Computing Systems
Conference · 2024 · OSTI ID:2538098