skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Co-design Framework for Online Data Analysis and Reduction

Abstract

Science applications preparing for the exascale era are increasingly exploring in situ computations comprising of simulation-analysis-reduction pipelines coupled in-memory. Efficient composition and execution of such complex pipelines for a target platform is a codesign process that evaluates the impact and tradeoffs of various application- and system-specific parameters. In this article, we describe a toolset for automating performance studies of composed HPC applications that perform online data reduction and analysis. We describe Cheetah, a new framework for composing parametric studies on coupled applications, and Savanna, a runtime engine for orchestrating and executing campaigns of codesign experiments. Furthermore, this toolset facilitates understanding the impact of various factors such as process placement, synchronicity of algorithms, and storage versus compute requirements for online analysis of large data. Ultimately, we aim to create a catalog of performance results that can help scientists understand tradeoffs when designing next-generation simulations that make use of online processing techniques. We illustrate the design of Cheetah and Savanna, and present application examples that use this framework to conduct codesign studies on small clusters as well as leadership class supercomputers.

Authors:
ORCiD logo [1];  [2]; ORCiD logo [1];  [1]; ORCiD logo [1];  [3]; ORCiD logo [1];  [4];  [5];  [2];  [3];  [2];  [6]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Univ. of Maryland, College Park, MD (United States)
  4. Nara Inst. of Science and Technology (Japan)
  5. Univ. of Oregon, Eugene, OR (United States)
  6. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1817542
Alternate Identifier(s):
OSTI ID: 1869488; OSTI ID: 1887662
Grant/Contract Number:  
AC05-00OR22725; AC02-05CH11231; AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 34; Journal Issue: 14; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Exascale; Cheetah; Savanna; CODAR; workflows; in situ; online; reduction; co-design

Citation Formats

Mehta, Kshitij, Allen, Bryce, Wolf, Matthew, Logan, Jeremy, Suchyta, Eric, Singhal, Swati, Choi, Jong Youl, Takahashi, Keichi, Huck, Kevin, Yakushin, Igor, Sussman, Alan, Munson, Todd, Foster, Ian, and Klasky, Scott. A Co-design Framework for Online Data Analysis and Reduction. United States: N. p., 2021. Web. doi:10.1002/cpe.6519.
Mehta, Kshitij, Allen, Bryce, Wolf, Matthew, Logan, Jeremy, Suchyta, Eric, Singhal, Swati, Choi, Jong Youl, Takahashi, Keichi, Huck, Kevin, Yakushin, Igor, Sussman, Alan, Munson, Todd, Foster, Ian, & Klasky, Scott. A Co-design Framework for Online Data Analysis and Reduction. United States. https://doi.org/10.1002/cpe.6519
Mehta, Kshitij, Allen, Bryce, Wolf, Matthew, Logan, Jeremy, Suchyta, Eric, Singhal, Swati, Choi, Jong Youl, Takahashi, Keichi, Huck, Kevin, Yakushin, Igor, Sussman, Alan, Munson, Todd, Foster, Ian, and Klasky, Scott. 2021. "A Co-design Framework for Online Data Analysis and Reduction". United States. https://doi.org/10.1002/cpe.6519. https://www.osti.gov/servlets/purl/1817542.
@article{osti_1817542,
title = {A Co-design Framework for Online Data Analysis and Reduction},
author = {Mehta, Kshitij and Allen, Bryce and Wolf, Matthew and Logan, Jeremy and Suchyta, Eric and Singhal, Swati and Choi, Jong Youl and Takahashi, Keichi and Huck, Kevin and Yakushin, Igor and Sussman, Alan and Munson, Todd and Foster, Ian and Klasky, Scott},
abstractNote = {Science applications preparing for the exascale era are increasingly exploring in situ computations comprising of simulation-analysis-reduction pipelines coupled in-memory. Efficient composition and execution of such complex pipelines for a target platform is a codesign process that evaluates the impact and tradeoffs of various application- and system-specific parameters. In this article, we describe a toolset for automating performance studies of composed HPC applications that perform online data reduction and analysis. We describe Cheetah, a new framework for composing parametric studies on coupled applications, and Savanna, a runtime engine for orchestrating and executing campaigns of codesign experiments. Furthermore, this toolset facilitates understanding the impact of various factors such as process placement, synchronicity of algorithms, and storage versus compute requirements for online analysis of large data. Ultimately, we aim to create a catalog of performance results that can help scientists understand tradeoffs when designing next-generation simulations that make use of online processing techniques. We illustrate the design of Cheetah and Savanna, and present application examples that use this framework to conduct codesign studies on small clusters as well as leadership class supercomputers.},
doi = {10.1002/cpe.6519},
url = {https://www.osti.gov/biblio/1817542}, journal = {Concurrency and Computation. Practice and Experience},
issn = {1532-0626},
number = 14,
volume = 34,
place = {United States},
year = {2021},
month = {8}
}

Works referenced in this record:

A Vision for Managing Extreme-Scale Data Hoards
conference, July 2019


Optimal scheduling of in-situ analysis for large-scale scientific simulations
conference, January 2015

  • Malakar, Preeti; Vishwanath, Venkatram; Munson, Todd
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • https://doi.org/10.1145/2807591.2807656

Fault-aware, utility-based job scheduling on Blue, Gene/P systems
conference, August 2009


Expertus: A Generator Approach to Automate Performance Testing in IaaS Clouds
conference, June 2012

  • Jayasinghe, Deepal; Swint, Galen; Malkowski, Simon
  • 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), 2012 IEEE Fifth International Conference on Cloud Computing
  • https://doi.org/10.1109/CLOUD.2012.98

Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems
journal, January 2005


Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
conference, May 2018


Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS
journal, August 2013


Feature-preserving Lossy Compression for In Situ Data Analysis
conference, August 2020


Zoo: a desktop experiment management environment
journal, June 1997


Experiment Management and Analysis with perfbase
conference, September 2005


Optimal Execution of Co-analysis for Large-Scale Molecular Dynamics Simulations
conference, November 2016

  • Malakar, Preeti; Vishwanath, Venkatram; Knight, Christopher
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2016.59

Swift: A language for distributed parallel scripting
journal, September 2011


The Exascale Computing Project
journal, May 2017


Fast Error-Bounded Lossy HPC Data Compression with SZ
conference, May 2016


Flux: A Next-Generation Resource Management Framework for Large HPC Centers
conference, September 2014

  • Ahn, Dong H.; Garlick, Jim; Grondona, Mark
  • 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops
  • https://doi.org/10.1109/ICPPW.2014.15

Z-checker: A framework for assessing lossy compression of scientific data
journal, November 2017


Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014


The Tau Parallel Performance System
journal, May 2006


Multilevel Techniques for Compression and Reduction of Scientific Data---The Unstructured Case
journal, January 2020


Complex Patterns in a Simple System
journal, July 1993


The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science
journal, May 2021


FTK: A Simplicial Spacetime Meshing Framework for Robust and Scalable Feature Tracking
journal, January 2021


Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales
conference, December 2017


Coupling Exascale Multiphysics Applications: Methods and Lessons Learned
conference, October 2018


Parsl: Pervasive Parallel Programming in Python
conference, January 2019

  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19
  • https://doi.org/10.1145/3307681.3325400

A Codesign Framework for Online Data Analysis and Reduction
conference, November 2019