CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows

Duan, Shaohua; Subedi, Pradeep; Davis, Philip; Teranishi, Keita; Kolla, Hemanth; Gamell, Marc; Parashar, Manish

doi:10.1145/3391448

Title: CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows

Abstract

The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean time between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed, and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this article, we present CoREC, a scalable and resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. It also leverages multiple levels of replications and erasure coding to support diverse data resiliency requirements. Furthermore, the article presents optimizations for load balancing and conflict-avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime andmore »« less

Authors:

Duan, Shaohua ^[1]; Subedi, Pradeep ^[1]; Davis, Philip ^[1]; Teranishi, Keita ^[2]; Kolla, Hemanth ^[2]; Gamell, Marc ^[3]; Parashar, Manish ^[1]

Rutgers Univ., Piscataway, NJ (United States)
Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Intel, Austin, TX (United States)

Publication Date:: Sun May 31 00:00:00 EDT 2020

Research Org.:: Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Sponsoring Org.:: USDOE National Nuclear Security Administration (NNSA)

OSTI Identifier:: 1769940

Report Number(s):: SAND-2021-2256J
Journal ID: ISSN 2329-4949; 694181

Grant/Contract Number:: AC04-94AL85000

Resource Type:: Accepted Manuscript

Journal Name:: ACM Transactions on Parallel Computing

Additional Journal Information:: Journal Volume: 7; Journal Issue: 2; Journal ID: ISSN 2329-4949

Publisher:: Association for Computing Machinery

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; Data resilience; Erasure codes; Replication; In-situ workflows; Data staging

Citation Formats


                    Duan, Shaohua, Subedi, Pradeep, Davis, Philip, Teranishi, Keita, Kolla, Hemanth, Gamell, Marc, and Parashar, Manish. CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows.  United States: N. p., 2020. 
Web.  doi:10.1145/3391448.

Copy to clipboard


                    Duan, Shaohua, Subedi, Pradeep, Davis, Philip, Teranishi, Keita, Kolla, Hemanth, Gamell, Marc, & Parashar, Manish. CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows.  United States.  https://doi.org/10.1145/3391448

Copy to clipboard


                    Duan, Shaohua, Subedi, Pradeep, Davis, Philip, Teranishi, Keita, Kolla, Hemanth, Gamell, Marc, and Parashar, Manish. Sun .  
"CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows".  United States.  https://doi.org/10.1145/3391448.  https://www.osti.gov/servlets/purl/1769940.

Copy to clipboard


                    
@article{osti_1769940,

  title        = {CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows},

  author       = {Duan, Shaohua and Subedi, Pradeep and Davis, Philip and Teranishi, Keita and Kolla, Hemanth and Gamell, Marc and Parashar, Manish},

  abstractNote = {The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean time between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed, and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this article, we present CoREC, a scalable and resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. It also leverages multiple levels of replications and erasure coding to support diverse data resiliency requirements. Furthermore, the article presents optimizations for load balancing and conflict-avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime and have deployed with the DataSpaces staging service on leadership class computing machines and present an experimental evaluation in the article. Here, the experiments demonstrate that CoREC can tolerate in-memory data failures while maintaining low latency and sustaining high overall storage efficiency at large scales.},

  doi          = {10.1145/3391448},

  journal      = {ACM Transactions on Parallel Computing},

  number       = 2,

  volume       = 7,

  place        = {United States},

  year         = {Sun May 31 00:00:00 EDT 2020},

  month        = {Sun May 31 00:00:00 EDT 2020}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1145/3391448

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Combining Partial Redundancy and Checkpointing for HPC
conference, June 2012

Elliott, James; Kharbas, Kishor; Fiala, David
2012 IEEE 32nd International Conference on Distributed Computing Systems (ICDCS)
DOI: 10.1109/ICDCS.2012.56

Terascale direct numerical simulations of turbulent combustion using S3D
journal, January 2009

Chen, J. H.; Choudhary, A.; de Supinski, B.
Computational Science & Discovery, Vol. 2, Issue 1
DOI: 10.1088/1749-4699/2/1/015001

Failures in large scale systems: long-term measurement, analysis, and implications
conference, January 2017

Gupta, Saurabh; Patel, Tirthak; Engelmann, Christian
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
DOI: 10.1145/3126908.3126937

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing
conference, November 2014

Lofstead, Jay; Dayal, Jai; Jimenez, Ivo
2014 International Workshop on Data Intensive Scalable Computing Systems (DISCS)
DOI: 10.1109/DISCS.2014.13

Post-failure recovery of MPI communication capability: Design and rationale
journal, June 2013

Bland, Wesley; Bouteiller, Aurelien; Herault, Thomas
The International Journal of High Performance Computing Applications, Vol. 27, Issue 3
DOI: 10.1177/1094342013488238

ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing: ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing
journal, October 2014

Docan, Ciprian; Zhang, Fan; Jin, Tong
Concurrency and Computation: Practice and Experience, Vol. 27, Issue 14
DOI: 10.1002/cpe.3407

Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention
conference, May 2019

Aupy, Guillaume; Beaumont, Olivier; Eyraud-Dubois, Lionel
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
DOI: 10.1109/IPDPS.2019.00072

SmartBlock: An Approach to Standardizing In Situ Workflow Components
conference, May 2017

Champsaur, Alexis; Lofstead, Jay; Dayal, Jai
2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
DOI: 10.1109/IPDPSW.2017.149

Scalable Data Resilience for In-memory Data Staging
conference, May 2018

Duan, Shaohua; Subedi, Pradeep; Teranishi, Keita
2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
DOI: 10.1109/IPDPS.2018.00021

Local recovery and failure masking for stencil-based applications at extreme scales
conference, January 2015

Gamell, Marc; Teranishi, Keita; Heroux, Michael A.
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
DOI: 10.1145/2807591.2807672

Management, analysis, and visualization of experimental and observational data — The convergence of data and computing
conference, October 2016

Bethel, E. Wes; Greenwald, Martin; van Dam, Kerstin Kleese
2016 IEEE 12th International Conference on e-Science (e-Science)
DOI: 10.1109/eScience.2016.7870902

Feature-Based Statistical Analysis of Combustion Simulation Data
journal, December 2011

Bennett, Janine C.; Krishnamoorthy, Vaidyanathan
IEEE Transactions on Visualization and Computer Graphics, Vol. 17, Issue 12
DOI: 10.1109/TVCG.2011.199

Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems
conference, June 2015

Gupta, Saurabh; Tiwari, Devesh; Jantzi, Christopher
2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
DOI: 10.1109/DSN.2015.52

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System
conference, June 2018

Nie, Bin; Xue, Ji; Gupta, Saurabh
2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
DOI: 10.1109/DSN.2018.00022

Reducing Waste in Extreme Scale Systems through Introspective Analysis
conference, May 2016

Bautista-Gomez, Leonardo; Gainaru, Ana; Perarnau, Swann
2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
DOI: 10.1109/IPDPS.2016.100

A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems
journal, February 2013

Egwutuoha, Ifeanyi P.; Levy, David; Selic, Bran
The Journal of Supercomputing, Vol. 65, Issue 3
DOI: 10.1007/s11227-013-0884-0

DataSpaces: an interaction and coordination framework for coupled simulation workflows
journal, February 2011

Docan, Ciprian; Parashar, Manish; Klasky, Scott
Cluster Computing, Vol. 15, Issue 2
DOI: 10.1007/s10586-011-0162-y

Polynomial Codes Over Certain Finite Fields
journal, June 1960

Reed, I. S.; Solomon, G.
Journal of the Society for Industrial and Applied Mathematics, Vol. 8, Issue 2
DOI: 10.1137/0108018

DataSpaces: an interaction and coordination framework for coupled simulation workflows
conference, January 2010

Docan, Ciprian; Parashar, Manish; Klasky, Scott
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10
DOI: 10.1145/1851476.1851481

A Comprehensive Analysis of XOR-Based Erasure Codes Tolerating 3 or More Concurrent Failures
conference, May 2013

Subedi, Pradeep; He, Xubin
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
DOI: 10.1109/IPDPSW.2013.155

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers
conference, September 2018

Kougkas, Anthony; Devarajan, Hariharan; Sun, Xian-He
2018 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTER.2018.00046

Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows
conference, November 2018

Subedi, Pradeep; Davis, Philip; Duan, Shaohua
SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2018.00076

Leveraging burst buffer coordination to prevent I/O interference
conference, October 2016

Kougkas, Anthony; Dorier, Matthieu; Latham, Rob
2016 IEEE 12th International Conference on e-Science (e-Science)
DOI: 10.1109/eScience.2016.7870922

In Situ Visualization for Large-Scale Combustion Simulations
journal, May 2010

Hongfeng Yu, ; Grout, Ray W.
IEEE Computer Graphics and Applications, Vol. 30, Issue 3
DOI: 10.1109/MCG.2010.55

Real-Time In-Memory Checkpointing for Future Hybrid Memory Systems
conference, June 2015

Gao, Shen; He, Bingsheng; Xu, Jianliang
ICS'15: 2015 International Conference on Supercomputing, Proceedings of the 29th ACM on International Conference on Supercomputing
DOI: 10.1145/2751205.2751212

AnalyzeThis: an analysis workflow-aware storage system
conference, November 2015

Sim, Hyogi; Kim, Youngjae; Vazhkudai, Sudharshan S.
SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1145/2807591.2807622

Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior
conference, September 2017

Tang, Kun; Huang, Ping; He, Xubin
2017 IEEE 25th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
DOI: 10.1109/MASCOTS.2017.35

Similar Records in DOE PAGES and OSTI.GOV collections:

DataSpaces: an interaction and coordination framework for coupled simulation workflows

Journal Article Docan, Ciprian ; Parashar, Manish ; Klasky, Scott - Cluster Computing

Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence. These components run on different high performance computing resources, need to interact at runtime with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient and scalable support for dynamic and flexible couplings and interactions, which remains a challenge. Thismore »« less
https://doi.org/10.1007/s10586-011-0162-y
DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows

Conference Docan, Ciprian ; Klasky, Scott A ; Parashar, Manish

Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence, and run on different high performance computing resources. These components need to interact, at runtime, with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient support for dynamic and flexible couplings and interactions, which remains a challenge. This papermore »« less
Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA

Conference Zhang, Bo ; Davis, Philip ; Morales, Nicolas ; ...

The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, andmore »« less
https://doi.org/10.1007/978-3-031-39698-4_22

Full Text Available
Understanding the Impact of Data Staging for Coupled Scientific Workflows

Journal Article Gainaru, Ana ; Wan, Lipeng ; Wang, Ruonan ; ... - IEEE Transactions on Parallel and Distributed Systems

We report the rate of data generated by cutting-edge experimental science facilities and large-scale simulations enabled by current high-performance computing (HPC) systems has continued to grow at a far greater pace than the development of the network and storage capabilities on which these systems rely. To cope with this challenge, scientist are moving toward the creation of autonomous experiments and HPC simulations using machine learning. However, efficiently moving, storing, and processing large amounts of data away from the point of origin presents an incredible challenge. In-memory computing, in situ analysis, data staging, and data streaming are recognized viable alternatives tomore »« less
https://doi.org/10.1109/tpds.2022.3179989

Full Text Available
Adaptive elasticity policies for staging-based in situ visualization

Journal Article Wang, Zhe ; Dorier, Matthieu ; Subedi, Pradeep ; ... - Future Generations Computer Systems

In situ processing aims to alleviate the growing gap between computation and I/O capabilities by performing data processing close to the data source. In situ processing is widely used to process data generated by multiple data sources, including observation data from edge devices or scientific observational facilities and the simulation data generated by scientific computation on a high-performance computing (HPC) platform. For a scientific workflow that is run on an HPC platform and composed of a simulation program and an in situ data analytics or visualization (abbreviated as ana/vis) task, there is an implicit assumption that the computing resources assignedmore »« less
https://doi.org/10.1016/j.future.2022.12.010

Full Text Available

Similar Records

Title: CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows

Abstract

Citation Formats

Combining Partial Redundancy and Checkpointing for HPC conference, June 2012

Terascale direct numerical simulations of turbulent combustion using S3D journal, January 2009

Failures in large scale systems: long-term measurement, analysis, and implications conference, January 2017

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing conference, November 2014

Post-failure recovery of MPI communication capability: Design and rationale journal, June 2013

ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing: ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing journal, October 2014

Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention conference, May 2019

SmartBlock: An Approach to Standardizing In Situ Workflow Components conference, May 2017

Scalable Data Resilience for In-memory Data Staging conference, May 2018

Local recovery and failure masking for stencil-based applications at extreme scales conference, January 2015

Management, analysis, and visualization of experimental and observational data — The convergence of data and computing conference, October 2016

Feature-Based Statistical Analysis of Combustion Simulation Data journal, December 2011

Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems conference, June 2015

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System conference, June 2018

Reducing Waste in Extreme Scale Systems through Introspective Analysis conference, May 2016

A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems journal, February 2013

DataSpaces: an interaction and coordination framework for coupled simulation workflows journal, February 2011

Polynomial Codes Over Certain Finite Fields journal, June 1960

DataSpaces: an interaction and coordination framework for coupled simulation workflows conference, January 2010

A Comprehensive Analysis of XOR-Based Erasure Codes Tolerating 3 or More Concurrent Failures conference, May 2013

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers conference, September 2018

Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows conference, November 2018

Leveraging burst buffer coordination to prevent I/O interference conference, October 2016

In Situ Visualization for Large-Scale Combustion Simulations journal, May 2010

Real-Time In-Memory Checkpointing for Future Hybrid Memory Systems conference, June 2015

AnalyzeThis: an analysis workflow-aware storage system conference, November 2015

Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior conference, September 2017

Combining Partial Redundancy and Checkpointing for HPC
conference, June 2012

Terascale direct numerical simulations of turbulent combustion using S3D
journal, January 2009

Failures in large scale systems: long-term measurement, analysis, and implications
conference, January 2017

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing
conference, November 2014

Post-failure recovery of MPI communication capability: Design and rationale
journal, June 2013

ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing: ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing
journal, October 2014

Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention
conference, May 2019

SmartBlock: An Approach to Standardizing In Situ Workflow Components
conference, May 2017

Scalable Data Resilience for In-memory Data Staging
conference, May 2018

Local recovery and failure masking for stencil-based applications at extreme scales
conference, January 2015

Management, analysis, and visualization of experimental and observational data — The convergence of data and computing
conference, October 2016

Feature-Based Statistical Analysis of Combustion Simulation Data
journal, December 2011

Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems
conference, June 2015

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System
conference, June 2018

Reducing Waste in Extreme Scale Systems through Introspective Analysis
conference, May 2016

A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems
journal, February 2013

DataSpaces: an interaction and coordination framework for coupled simulation workflows
journal, February 2011

Polynomial Codes Over Certain Finite Fields
journal, June 1960

DataSpaces: an interaction and coordination framework for coupled simulation workflows
conference, January 2010

A Comprehensive Analysis of XOR-Based Erasure Codes Tolerating 3 or More Concurrent Failures
conference, May 2013

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers
conference, September 2018

Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows
conference, November 2018

Leveraging burst buffer coordination to prevent I/O interference
conference, October 2016

In Situ Visualization for Large-Scale Combustion Simulations
journal, May 2010

Real-Time In-Memory Checkpointing for Future Hybrid Memory Systems
conference, June 2015

AnalyzeThis: an analysis workflow-aware storage system
conference, November 2015

Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior
conference, September 2017