Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8]
  1. Argonne National Laboratory, Lemont, IL, USA
  2. National Energy Research Scientific Computing Center, Berkeley, CA, USA
  3. Sandia National Laboratories, Livermore, CA, USA
  4. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  5. Sandia National Laboratories, Albuquerque, NM, USA
  6. Brookhaven National Laboratory, Upton, NY, USA
  7. Los Alamos National Laboratory, Los Alamos, NM, USA
  8. Oak Ridge National Laboratory, Oak Ridge, TN, USA

In January 2019, the US Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions (PRDs) for in situ data management (ISDM). A fundamental finding of this workshop is that the methodologies used to manage data among a variety of tasks in situ can be used to facilitate scientific discovery from many different data sources—simulation, experiment, and sensors, for example—and that being able to do so at numerous computing scales will benefit real-time decision-making, design optimization, and data-driven scientific discovery. This article describes six PRDs identified by the workshop, which highlight the components and capabilities needed for ISDM to be successful for a wide variety of applications—making ISDM capabilities more pervasive, controllable, composable, and transparent, with a focus on greater coordination with the software stack and a diversity of fundamentally new data algorithms.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Sandia National Laboratories (SNL-CA), Livermore, CA (United States); Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
89233218CNA000001; AC02-05CH11231; AC02-06CH11357; AC04-94AL85000; AC05-00OR22725; NA0003525; SC0012704
OSTI ID:
1606603
Alternate ID(s):
OSTI ID: 1617316
OSTI ID: 1761657
OSTI ID: 1635099
OSTI ID: 1650112
OSTI ID: 1776768
Report Number(s):
BNL--216074-2020-JAAM; LA-UR--20-28344; SAND--2020-4034J
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (86)

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS journal August 2013
FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications journal May 2015
Alchemist: An Apache Spark ⇔ MPI interface journal November 2018
Coupling the Uintah Framework and the VisIt Toolkit for Parallel In Situ Data Analysis and Visualization and Computational Steering book January 2018
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir book January 2012
CyberShake: A Physics-Based Seismic Hazard Model for Southern California journal May 2010
A Taxonomy of Workflow Management Systems for Grid Computing journal September 2005
Optimal Compressed Sensing and Reconstruction of Unstructured Mesh Datasets journal August 2017
A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications journal January 2017
ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization journal December 2009
Workflows and e-Science: An overview of workflow system features and capabilities journal May 2009
Pegasus, a workflow management system for science automation journal May 2015
WOWMON: A Machine Learning-based Profiler for Self-adaptive Instrumentation of Scientific Workflows journal January 2016
Bluesky's Ahead: A Multi-Facility Collaboration for an a la Carte Software Project for Data Acquisition and Management journal May 2019
QMDS: a file system metadata management service supporting a graph data model-based query language journal April 2013
HPCToolkit: performance tools for scientific computing journal July 2008
Linac Coherent Light Source data analysis using psana journal March 2016
Xi-cam : a versatile interface for data visualization and analysis journal May 2018
Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models
  • Caino-Lores, Silvina; Carretero, Jesus; Nicolae, Bogdan
  • 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT) https://doi.org/10.1109/BDCAT.2018.00010
conference December 2018
BurstMem: A high-performance burst buffer system for scientific applications conference October 2014
Leveraging large sensor streams for robust cloud control conference December 2016
Bredala: Semantic Data Redistribution for In Situ Applications conference September 2016
Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data conference September 2017
Automatic Data Filtering for In Situ Workflows conference September 2017
Modular HPC I/O Characterization with Darshan conference November 2016
A Scalable Observation System for Introspection and In Situ Analytics conference November 2016
User Environment Tracking and Problem Detection with XALT conference November 2014
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System conference December 2016
Spiking network algorithms for scientific computing conference October 2016
The Parallel Computation of Morse-Smale Complexes
  • Gyulassy, Attila; Pascucci, Valerio; Peterka, Tom
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.52
conference May 2012
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
Parallel Tensor Compression for Large-Scale Scientific Data conference May 2016
Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems conference May 2017
A Case Study in Computational Caching Microservices for HPC conference May 2017
The SENSEI Generic In Situ Interface
  • Ayachit, Utkarsh; Whitlock, Brad; Wolf, Matthew
  • 2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV) https://doi.org/10.1109/ISAV.2016.013
conference November 2016
Analysis of large-scale scalar data using hixels conference October 2011
Foundations of Multivariate Functional Approximation for Scientific Data conference October 2018
Heterogeneous Hierarchical Workflow Composition journal July 2019
Building near-real-time processing pipelines with the spark-MPI platform conference August 2017
Prescriptive provenance for streaming analysis of workflows at scale conference August 2018
In situ magnetic flux vortex visualization in time-dependent Ginzburg-Landau superconductor simulations conference April 2017
Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets conference April 2017
Methodology for the Rapid Development of Scalable HPC Data Services conference November 2018
Using Property Graphs for Rich Metadata Management in HPC Systems conference November 2014
In Situ Prediction Driven Feature Analysis in Jet Engine Simulations conference April 2018
In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees
  • Landge, Aaditya G.; Pascucci, Valerio; Gyulassy, Attila
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.88
conference November 2014
Exascale Deep Learning for Climate Analytics conference November 2018
Attacking the Opioid Epidemic: Determining the Epistatic and Pleiotropic Genetic Architectures for Chronic Pain and Opioid Addiction conference November 2018
Subband coding for large-scale scientific simulation data using JPEG 2000 conference April 2012
Topology-Controlled Volume Rendering journal March 2007
An Information-Aware Framework for Exploring Multivariate Data Sets journal December 2013
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Uncertainty Visualization Using Copula-Based Analysis in Mixed Distribution Models journal January 2018
Shared-Memory Parallel Computation of Morse-Smale Complexes with Improved Accuracy journal January 2019
Accelerating Data Acquisition, Reduction, and Analysis at the Spallation Neutron Source conference October 2014
In-situ Sampling of a Large-Scale Particle Simulation for Interactive Visualization and Analysis journal June 2011
Site remediation in a virtual environment
  • Bethel, E. W.; Jacobsen, Janet; Holland, Preston
  • IS&T/SPIE 1994 International Symposium on Electronic Imaging: Science and Technology, SPIE Proceedings https://doi.org/10.1117/12.172079
conference April 1994
Trigger Detection for Adaptive Scientific Workflows Using Percentile Sampling journal January 2016
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08 https://doi.org/10.1145/1383529.1383533
conference January 2008
Monalytics: online monitoring and analytics for managing large scale data centers conference January 2010
SCIRun: a scientific programming environment for computational steering conference January 1995
Distributed merge trees conference January 2013
Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R
  • Brightwell, Ron; Oldfield, Ron; Maccabe, Arthur B.
  • Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '13 https://doi.org/10.1145/2491661.2481427
conference January 2013
The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015
Smart: a MapReduce-like framework for in-situ scientific analytics
  • Wang, Yi; Agrawal, Gagan; Bicer, Tekin
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807650
conference January 2015
Enabling Adaptive Scientific Workflows Via Trigger Detection
  • Salloum, Maher; Bennett, Janine C.; Pinar, Ali
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828619
conference January 2015
ParaView Catalyst: Enabling In Situ Data Analysis and Visualization
  • Ayachit, Utkarsh; Bauer, Andrew; Geveci, Berk
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828624
conference January 2015
Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations journal October 2016
In Situ Workflows at Exascale: System Software to the Rescue
  • Dreher, Matthieu; Perarnau, Swann; Peterka, Tom
  • Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization - ISAV'17 https://doi.org/10.1145/3144769.3144774
conference January 2017
The ALPINE In Situ Infrastructure: Ascending from the Ashes of Strawman
  • Larsen, Matthew; Ahrens, James; Ayachit, Utkarsh
  • Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization - ISAV'17 https://doi.org/10.1145/3144769.3144778
conference January 2017
CoSS: proposing a contract-based storage system for HPC
  • Dorier, Matthieu; Dreher, Matthieu; Peterka, Tom
  • Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems - PDSW-DISCS '17 https://doi.org/10.1145/3149393.3149396
conference January 2017
Runtime Analysis of Whole-System Provenance
  • Pasquier, Thomas; Han, Xueyuan; Moyer, Thomas
  • CCS '18: 2018 ACM SIGSAC Conference on Computer and Communications Security, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3243734.3243776
conference October 2018
In situ data-driven adaptive sampling for large-scale simulation data summarization
  • Biswas, Ayan; Dutta, Soumya; Pulido, Jesus
  • Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV '18 https://doi.org/10.1145/3281464.3281467
conference January 2018
A flexible system for in situ triggers
  • Larsen, Matthew; Woods, Amy; Marsaglia, Nicole
  • Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV '18 https://doi.org/10.1145/3281464.3281468
conference January 2018
Parsl: Pervasive Parallel Programming in Python
  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325400
conference January 2019
The Science DMZ: A Network Design Pattern for Data-Intensive Science journal January 2014
The Tau Parallel Performance System journal May 2006
The future of scientific workflows journal April 2017
Nanosurveyor: a framework for real-time data processing journal January 2017
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets journal January 2017
Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models journal October 2014
Titian: data provenance support in Spark journal November 2015
Workshop report on In Situ Data Management report February 2019
Energy Scaling Advantages of Resistive Memory Crossbar Based Computation and Its Application to Sparse Coding journal January 2016
Information Theory in Scientific Visualization journal January 2011
The Mira-Titan Universe. II. Matter Power Spectrum Emulation journal September 2017