Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Online data analysis and reduction: An important co-design motif for extreme-scale computers

Journal Article · · International Journal of High Performance Computing Applications

A growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze supercomputer application output only after that output has been written to a file system. Instead, data-generating applications must run concurrently with data reduction and/or analysis operations, with which they exchange information via high-speed methods such as interprocess communications. The resulting parallel computing motif, online data analysis and reduction (ODAR), has important implications for both application and HPC systems design. Here we introduce the ODAR motif and its co-design concerns, describe a co-design process for identifying and addressing those concerns, present tools that assist in the co-design process, and present case studies to illustrate the use of the process and tools in practical settings.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Exascale Computing Project (ECP); USDOE Office of Science - Office of Basic Energy Sciences - Scientific User Facilities Division
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1873528
Journal Information:
International Journal of High Performance Computing Applications, Vol. 35, Issue 6
Country of Publication:
United States
Language:
English

References (57)

Nimrod: a tool for performing parametrised simulations using distributed workstations January 1995
Multilevel Techniques for Compression and Reduction of Scientific Data---The Unstructured Case January 2020
Multilevel Techniques for Compression and Reduction of Scientific Data---The Multivariate Case January 2019
Exascale applications: skin in the game January 2020
Performance Analysis, Design Considerations, and Applications of Extreme-Scale In Situ Infrastructures November 2016
The ITER design April 2002
Autotuning in High-Performance Computing Applications November 2018
In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms June 2016
Lightweight computational steering of very large scale molecular dynamics simulations January 1996
Linda in context April 1989
A terminology for in situ visualization and analysis systems August 2020
Coupling Exascale Multiphysics Applications: Methods and Lessons Learned October 2018
A Co-Design Study Of Fusion Whole Device Modeling Using Code Coupling November 2019
TGE: Machine Learning Based Task Graph Embedding for Large-Scale Topology Mapping September 2017
A field study of the software design process for large systems November 1988
Hardware/software co-design March 1997
Fast Error-Bounded Lossy HPC Data Compression with SZ May 2016
Error Analysis of ZFP Compression for Floating-Point Data January 2019
DataSpaces: an interaction and coordination framework for coupled simulation workflows February 2011
A tight-coupling scheme sharing minimum information across a spatial interface between gyrokinetic turbulence codes July 2018
Active Learning in Performance Analysis September 2016
Weighted random sampling with a reservoir March 2006
Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales December 2017
Distance visualization: data exploration on the grid January 1999
Scaling System-Level Science: Scientific Exploration and IT Implications January 2006
ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management July 2020
Rules of thumb in data engineering January 2000
FTK: A Simplicial Spacetime Meshing Framework for Robust and Scalable Feature Tracking January 2021
Performance modeling for systematic performance tuning January 2011
Runtime Visualization of the Human Arterial Tree July 2007
Interactive simulation and visualization January 1999
The ALPINE In Situ Infrastructure: Ascending from the Ashes of Strawman January 2017
DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding November 2019
Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets December 2018
In-situ processing and visualization for ultrascale simulations July 2007
Topology-Aware Space-Shared Co-Analysis of Large-Scale Molecular Dynamics Simulations November 2018
Optimal Execution of Co-analysis for Large-Scale Molecular Dynamics Simulations November 2016
Optimal scheduling of in-situ analysis for large-scale scientific simulations January 2015
A Codesign Framework for Online Data Analysis and Reduction November 2019
From desktop to Large-Scale Model Exploration with Swift/T December 2016
Prescriptive provenance for streaming analysis of workflows at scale August 2018
Fast Lossless Compression of Scientific Floating-Point Data January 2006
Programming with BIG Data in R: Scaling Analytics from One to Thousands of Nodes July 2017
The Tau Parallel Performance System May 2006
Exploiting task and data parallelism on a multicomputer January 1993
The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science May 2021
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization May 2017
Z-checker: A framework for assessing lossy compression of scientific data November 2017
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations September 2010
Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration November 2018
MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows December 2019
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research December 2018
A Visual Analytics Framework for the Detection of Anomalous Call Stack Trees in High Performance Computing Applications January 2019
Feature-preserving Lossy Compression for In Situ Data Analysis August 2020
Streaming spectral clustering May 2016
Streaming Classical Multidimensional Scaling August 2018
Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization
  • No authors listed
  • HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3369583.3392688
June 2020

Similar Records

Online data analysis and reduction: An important Co-design motif for extreme-scale computers
Journal Article · 2021 · International Journal of High Performance Computing Applications · OSTI ID:1788100

...And Eat it Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats
Conference · 2009 · OSTI ID:982187

A Co-design Framework for Online Data Analysis and Reduction
Journal Article · 2021 · Concurrency and Computation. Practice and Experience · OSTI ID:1817542