DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources

Abstract

© The Author(s) 2020. In January 2019, the US Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions (PRDs) for in situ data management (ISDM). A fundamental finding of this workshop is that the methodologies used to manage data among a variety of tasks in situ can be used to facilitate scientific discovery from many different data sources—simulation, experiment, and sensors, for example—and that being able to do so at numerous computing scales will benefit real-time decision-making, design optimization, and data-driven scientific discovery. This article describes six PRDs identified by the workshop, which highlight the components and capabilities needed for ISDM to be successful for a wide variety of applications—making ISDM capabilities more pervasive, controllable, composable, and transparent, with a focus on greater coordination with the software stack and a diversity of fundamentally new data algorithms.

Authors:
ORCiD logo [1];  [2];  [3];  [4];  [5]; ORCiD logo [6];  [7];  [8]
  1. Argonne National Laboratory, Lemont, IL, USA
  2. National Energy Research Scientific Computing Center, Berkeley, CA, USA
  3. Sandia National Laboratories, Livermore, CA, USA
  4. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  5. Sandia National Laboratories, Albuquerque, NM, USA
  6. Brookhaven National Laboratory, Upton, NY, USA
  7. Los Alamos National Laboratory, Los Alamos, NM, USA
  8. Oak Ridge National Laboratory, Oak Ridge, TN, USA
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Brookhaven National Laboratory (BNL), Upton, NY (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1606603
Alternate Identifier(s):
OSTI ID: 1617316; OSTI ID: 1635099; OSTI ID: 1650112; OSTI ID: 1761657; OSTI ID: 1776768
Report Number(s):
SAND-2020-4034J; BNL-216074-2020-JAAM; LA-UR-20-28344
Journal ID: ISSN 1094-3420
Grant/Contract Number:  
AC04-94AL85000; AC02-06CH11357; NA0003525; AC02-05CH11231; 89233218CNA000001; SC0012704; AC05-00OR22725
Resource Type:
Published Article
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Name: International Journal of High Performance Computing Applications; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; data analysis; scientific workflows; in situ data management; computer science; high performance computing

Citation Formats

Peterka, Tom, Bard, Deborah, Bennett, Janine C., Bethel, E. Wes, Oldfield, Ron A., Pouchard, Line, Sweeney, Christine, and Wolf, Matthew. Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources. United States: N. p., 2020. Web. doi:10.1177/1094342020913628.
Peterka, Tom, Bard, Deborah, Bennett, Janine C., Bethel, E. Wes, Oldfield, Ron A., Pouchard, Line, Sweeney, Christine, & Wolf, Matthew. Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources. United States. https://doi.org/10.1177/1094342020913628
Peterka, Tom, Bard, Deborah, Bennett, Janine C., Bethel, E. Wes, Oldfield, Ron A., Pouchard, Line, Sweeney, Christine, and Wolf, Matthew. Fri . "Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources". United States. https://doi.org/10.1177/1094342020913628.
@article{osti_1606603,
title = {Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources},
author = {Peterka, Tom and Bard, Deborah and Bennett, Janine C. and Bethel, E. Wes and Oldfield, Ron A. and Pouchard, Line and Sweeney, Christine and Wolf, Matthew},
abstractNote = {© The Author(s) 2020. In January 2019, the US Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions (PRDs) for in situ data management (ISDM). A fundamental finding of this workshop is that the methodologies used to manage data among a variety of tasks in situ can be used to facilitate scientific discovery from many different data sources—simulation, experiment, and sensors, for example—and that being able to do so at numerous computing scales will benefit real-time decision-making, design optimization, and data-driven scientific discovery. This article describes six PRDs identified by the workshop, which highlight the components and capabilities needed for ISDM to be successful for a wide variety of applications—making ISDM capabilities more pervasive, controllable, composable, and transparent, with a focus on greater coordination with the software stack and a diversity of fundamentally new data algorithms.},
doi = {10.1177/1094342020913628},
journal = {International Journal of High Performance Computing Applications},
number = ,
volume = ,
place = {United States},
year = {Fri Mar 27 00:00:00 EDT 2020},
month = {Fri Mar 27 00:00:00 EDT 2020}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1177/1094342020913628

Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

HPCToolkit: performance tools for scientific computing
journal, July 2008


Leveraging large sensor streams for robust cloud control
conference, December 2016


Nanosurveyor: a framework for real-time data processing
journal, January 2017

  • Daurer, Benedikt J.; Krishnan, Hari; Perciano, Talita
  • Advanced Structural and Chemical Imaging, Vol. 3, Issue 1
  • DOI: 10.1186/s40679-017-0039-0

Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
conference, January 2008

  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08
  • DOI: 10.1145/1383529.1383533

Xi-cam : a versatile interface for data visualization and analysis
journal, May 2018

  • Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke
  • Journal of Synchrotron Radiation, Vol. 25, Issue 4
  • DOI: 10.1107/S1600577518005787

The ALPINE In Situ Infrastructure: Ascending from the Ashes of Strawman
conference, January 2017

  • Larsen, Matthew; Ahrens, James; Ayachit, Utkarsh
  • Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization - ISAV'17
  • DOI: 10.1145/3144769.3144778

Modular HPC I/O Characterization with Darshan
conference, November 2016

  • Snyder, Shane; Carns, Philip; Harms, Kevin
  • 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT)
  • DOI: 10.1109/ESPT.2016.006

In Situ Workflows at Exascale: System Software to the Rescue
conference, January 2017

  • Dreher, Matthieu; Perarnau, Swann; Peterka, Tom
  • Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization - ISAV'17
  • DOI: 10.1145/3144769.3144774

Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data
conference, September 2017

  • Li, Shaomeng; Sane, Sudhanshu; Orf, Leigh
  • 2017 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2017.15

Building near-real-time processing pipelines with the spark-MPI platform
conference, August 2017

  • Malitsky, Nikolay; Chaudhary, Aashish; Jourdain, Sebastien
  • 2017 New York Scientific Data Summit (NYSDS)
  • DOI: 10.1109/NYSDS.2017.8085039

A Scalable Observation System for Introspection and In Situ Analytics
conference, November 2016

  • Wood, Chad; Sane, Sudhanshu; Ellsworth, Daniel
  • 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT)
  • DOI: 10.1109/ESPT.2016.010

FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications
journal, May 2015

  • Jain, Anubhav; Ong, Shyue Ping; Chen, Wei
  • Concurrency and Computation: Practice and Experience, Vol. 27, Issue 17
  • DOI: 10.1002/cpe.3505

Information Theory in Scientific Visualization
journal, January 2011


Linac Coherent Light Source data analysis using psana
journal, March 2016

  • Damiani, D.; Dubrovin, M.; Gaponenko, I.
  • Journal of Applied Crystallography, Vol. 49, Issue 2
  • DOI: 10.1107/S1600576716004349

BurstMem: A high-performance burst buffer system for scientific applications
conference, October 2014


CoSS: proposing a contract-based storage system for HPC
conference, January 2017

  • Dorier, Matthieu; Dreher, Matthieu; Peterka, Tom
  • Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems - PDSW-DISCS '17
  • DOI: 10.1145/3149393.3149396

Foundations of Multivariate Functional Approximation for Scientific Data
conference, October 2018

  • Peterka, Tom; Nashed, Youssef S. G.; Grindeanu, Iulian
  • 2018 IEEE 8th Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2018.8739195

The Parallel Computation of Morse-Smale Complexes
conference, May 2012

  • Gyulassy, Attila; Pascucci, Valerio; Peterka, Tom
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.52

Fast Error-Bounded Lossy HPC Data Compression with SZ
conference, May 2016

  • Di, Sheng; Cappello, Franck
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.11

Data Elevator: Low-Contention Data Movement in Hierarchical Storage System
conference, December 2016

  • Dong, Bin; Byna, Suren; Wu, Kesheng
  • 2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2016.026

Methodology for the Rapid Development of Scalable HPC Data Services
conference, November 2018

  • Dorier, Matthieu; Settlemyer, Brad; Shipman, Galen
  • 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)
  • DOI: 10.1109/PDSW-DISCS.2018.00013

Bredala: Semantic Data Redistribution for In Situ Applications
conference, September 2016

  • Dreher, Matthieu; Peterka, Tom
  • 2016 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2016.30

An Information-Aware Framework for Exploring Multivariate Data Sets
journal, December 2013

  • Biswas, Ayan; Dutta, Soumya
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 19, Issue 12
  • DOI: 10.1109/TVCG.2013.133

Workflows and e-Science: An overview of workflow system features and capabilities
journal, May 2009


Parsl: Pervasive Parallel Programming in Python
conference, January 2019

  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19
  • DOI: 10.1145/3307681.3325400

The Science DMZ: A Network Design Pattern for Data-Intensive Science
journal, January 2014

  • Dart, Eli; Rotman, Lauren; Tierney, Brian
  • Scientific Programming, Vol. 22, Issue 2
  • DOI: 10.1155/2014/701405

Trigger Detection for Adaptive Scientific Workflows Using Percentile Sampling
journal, January 2016

  • Bennett, Janine C.; Bhagatwala, Ankit; Chen, Jacqueline H.
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 5
  • DOI: 10.1137/15M1027942

WOWMON: A Machine Learning-based Profiler for Self-adaptive Instrumentation of Scientific Workflows
journal, January 2016


Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models
conference, December 2018

  • Caino-Lores, Silvina; Carretero, Jesus; Nicolae, Bogdan
  • 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT)
  • DOI: 10.1109/BDCAT.2018.00010

Runtime Analysis of Whole-System Provenance
conference, October 2018

  • Pasquier, Thomas; Han, Xueyuan; Moyer, Thomas
  • CCS '18: 2018 ACM SIGSAC Conference on Computer and Communications Security, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security
  • DOI: 10.1145/3243734.3243776

Bluesky's Ahead: A Multi-Facility Collaboration for an a la Carte Software Project for Data Acquisition and Management
journal, May 2019


Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models
journal, October 2014

  • Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.
  • PLoS Computational Biology, Vol. 10, Issue 10
  • DOI: 10.1371/journal.pcbi.1003882

Parallel Tensor Compression for Large-Scale Scientific Data
conference, May 2016

  • Austin, Woody; Ballard, Grey; Kolda, Tamara G.
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.67

The Mira-Titan Universe. II. Matter Power Spectrum Emulation
journal, September 2017

  • Lawrence, Earl; Heitmann, Katrin; Kwan, Juliana
  • The Astrophysical Journal, Vol. 847, Issue 1
  • DOI: 10.3847/1538-4357/aa86a9

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir
book, January 2012


Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations
journal, October 2016

  • Dorier, Matthieu; Antoniu, Gabriel; Cappello, Franck
  • ACM Transactions on Parallel Computing, Vol. 3, Issue 3
  • DOI: 10.1145/2987371

Topology-Controlled Volume Rendering
journal, March 2007

  • Weber, Gunther H.; Dillard, Scott E.; Carr, Hamish
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 13, Issue 2
  • DOI: 10.1109/TVCG.2007.47

Uncertainty Visualization Using Copula-Based Analysis in Mixed Distribution Models
journal, January 2018

  • Hazarika, Subhashis; Biswas, Ayan; Shen, Han-Wei
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 24, Issue 1
  • DOI: 10.1109/TVCG.2017.2744099

Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems
conference, May 2017

  • Gao, Tao; Guo, Yanfei; Zhang, Boyu
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.31

QMDS: a file system metadata management service supporting a graph data model-based query language
journal, April 2013

  • Ames, Sasha; Gokhale, Maya; Maltzahn, Carlos
  • International Journal of Parallel, Emergent and Distributed Systems, Vol. 28, Issue 2
  • DOI: 10.1080/17445760.2012.658802

Pegasus, a workflow management system for science automation
journal, May 2015


Automatic Data Filtering for In Situ Workflows
conference, September 2017

  • Mommessin, Clement; Dreher, Matthieu; Raffin, Bruno
  • 2017 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2017.35

Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R
conference, January 2013

  • Brightwell, Ron; Oldfield, Ron; Maccabe, Arthur B.
  • Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '13
  • DOI: 10.1145/2491661.2481427

A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications
journal, January 2017

  • James, Conrad D.; Aimone, James B.; Miner, Nadine E.
  • Biologically Inspired Cognitive Architectures, Vol. 19
  • DOI: 10.1016/j.bica.2016.11.002

Optimal Compressed Sensing and Reconstruction of Unstructured Mesh Datasets
journal, August 2017

  • Salloum, Maher; Fabian, Nathan D.; Hensinger, David M.
  • Data Science and Engineering, Vol. 3, Issue 1
  • DOI: 10.1007/s41019-017-0042-4

Smart: a MapReduce-like framework for in-situ scientific analytics
conference, January 2015

  • Wang, Yi; Agrawal, Gagan; Bicer, Tekin
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807650

The SENSEI Generic In Situ Interface
conference, November 2016

  • Ayachit, Utkarsh; Whitlock, Brad; Wolf, Matthew
  • 2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)
  • DOI: 10.1109/ISAV.2016.013

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS
journal, August 2013

  • Liu, Qing; Logan, Jeremy; Tian, Yuan
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 7
  • DOI: 10.1002/cpe.3125

Using Property Graphs for Rich Metadata Management in HPC Systems
conference, November 2014

  • Dai, Dong; Ross, Robert B.; Carns, Philip
  • 2014 9th Parallel Data Storage Workshop (PDSW)
  • DOI: 10.1109/PDSW.2014.11

SCIRun: a scientific programming environment for computational steering
conference, January 1995

  • Parker, Steven G.; Johnson, Christopher R.
  • Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '95
  • DOI: 10.1145/224170.224354

Alchemist: An Apache Spark ⇔ MPI interface
journal, November 2018

  • Gittens, Alex; Rothauge, Kai; Wang, Shusen
  • Concurrency and Computation: Practice and Experience
  • DOI: 10.1002/cpe.5026

Prescriptive provenance for streaming analysis of workflows at scale
conference, August 2018


User Environment Tracking and Problem Detection with XALT
conference, November 2014

  • Agrawal, Kapil; Fahey, Mark R.; McLay, Robert
  • 2014 First International Workshop on HPC User Support Tools (HUST)
  • DOI: 10.1109/HUST.2014.6

Monalytics: online monitoring and analytics for managing large scale data centers
conference, January 2010

  • Kutare, Mahendra; Eisenhauer, Greg; Wang, Chengwei
  • Proceeding of the 7th international conference on Autonomic computing - ICAC '10
  • DOI: 10.1145/1809049.1809073

Spiking network algorithms for scientific computing
conference, October 2016

  • Severa, William; Parekh, Ojas; Carlson, Kristofor D.
  • 2016 IEEE International Conference on Rebooting Computing (ICRC)
  • DOI: 10.1109/ICRC.2016.7738681

Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
journal, January 2017

  • Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De
  • Advanced Structural and Chemical Imaging, Vol. 3, Issue 1
  • DOI: 10.1186/s40679-017-0040-7

Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014

  • Lindstrom, Peter
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
  • DOI: 10.1109/TVCG.2014.2346458

The Tau Parallel Performance System
journal, May 2006

  • Shende, Sameer S.; Malony, Allen D.
  • The International Journal of High Performance Computing Applications, Vol. 20, Issue 2
  • DOI: 10.1177/1094342006064482

A flexible system for in situ triggers
conference, January 2018

  • Larsen, Matthew; Woods, Amy; Marsaglia, Nicole
  • Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV '18
  • DOI: 10.1145/3281464.3281468

In situ data-driven adaptive sampling for large-scale simulation data summarization
conference, January 2018

  • Biswas, Ayan; Dutta, Soumya; Pulido, Jesus
  • Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV '18
  • DOI: 10.1145/3281464.3281467

A Taxonomy of Workflow Management Systems for Grid Computing
journal, September 2005


Shared-Memory Parallel Computation of Morse-Smale Complexes with Improved Accuracy
journal, January 2019

  • Gyulassy, Attila; Bremer, Peer-Timo; Pascucci, Valerio
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 25, Issue 1
  • DOI: 10.1109/TVCG.2018.2864848

CyberShake: A Physics-Based Seismic Hazard Model for Southern California
journal, May 2010

  • Graves, Robert; Jordan, Thomas H.; Callaghan, Scott
  • Pure and Applied Geophysics, Vol. 168, Issue 3-4
  • DOI: 10.1007/s00024-010-0161-6

A Case Study in Computational Caching Microservices for HPC
conference, May 2017

  • Jenkins, John; Shipman, Galen; Mohd-Yusof, Jamaludin
  • 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2017.40

The future of scientific workflows
journal, April 2017

  • Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
  • The International Journal of High Performance Computing Applications, Vol. 32, Issue 1
  • DOI: 10.1177/1094342017704893

ParaView Catalyst: Enabling In Situ Data Analysis and Visualization
conference, January 2015

  • Ayachit, Utkarsh; Bauer, Andrew; Geveci, Berk
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015
  • DOI: 10.1145/2828612.2828624

Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets
conference, April 2017


Titian: data provenance support in Spark
journal, November 2015

  • Interlandi, Matteo; Shah, Kshitij; Tetali, Sai Deep
  • Proceedings of the VLDB Endowment, Vol. 9, Issue 3
  • DOI: 10.14778/2850583.2850595

In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees
conference, November 2014

  • Landge, Aaditya G.; Pascucci, Valerio; Gyulassy, Attila
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.88

Distributed merge trees
conference, January 2013

  • Morozov, Dmitriy; Weber, Gunther
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
  • DOI: 10.1145/2442516.2442526

In situ magnetic flux vortex visualization in time-dependent Ginzburg-Landau superconductor simulations
conference, April 2017


Energy Scaling Advantages of Resistive Memory Crossbar Based Computation and Its Application to Sparse Coding
journal, January 2016