DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Architecture and method for a burst buffer using flash technology

Abstract

A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.

Inventors:
; ; ; ; ;
Issue Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1243041
Patent Number(s):
9286261
Application Number:
13/676,000
Assignee:
EMC Corporation (Hopkinton, MA) Los Alamos National Security, LLC (Los Alamos, NM)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
AC52-06NA25396
Resource Type:
Patent
Resource Relation:
Patent File Date: 2012 Nov 13
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Tzelnic, Percy, Faibish, Sorin, Gupta, Uday K., Bent, John, Grider, Gary Alan, and Chen, Hsing-bung. Architecture and method for a burst buffer using flash technology. United States: N. p., 2016. Web.
Tzelnic, Percy, Faibish, Sorin, Gupta, Uday K., Bent, John, Grider, Gary Alan, & Chen, Hsing-bung. Architecture and method for a burst buffer using flash technology. United States.
Tzelnic, Percy, Faibish, Sorin, Gupta, Uday K., Bent, John, Grider, Gary Alan, and Chen, Hsing-bung. Tue . "Architecture and method for a burst buffer using flash technology". United States. https://www.osti.gov/servlets/purl/1243041.
@article{osti_1243041,
title = {Architecture and method for a burst buffer using flash technology},
author = {Tzelnic, Percy and Faibish, Sorin and Gupta, Uday K. and Bent, John and Grider, Gary Alan and Chen, Hsing-bung},
abstractNote = {A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Mar 15 00:00:00 EDT 2016},
month = {Tue Mar 15 00:00:00 EDT 2016}
}

Works referenced in this record:

DASH-IO: an empirical study of flash-based IO for HPC
conference, January 2010


...and eat it too: high read performance in write-optimized HPC I/O middleware file formats
conference, January 2009


Enhancing Checkpoint Performance with Staging IO and SSD
conference, May 2010

  • Ouyang, Xiangyong; Marcarelli, Sonya; Panda, Dhabaleswar K.
  • 2010 International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI)
  • https://doi.org/10.1109/SNAPI.2010.10

PLFS: a checkpoint filesystem for parallel applications
conference, January 2009


An abstract-device interface for implementing portable parallel-I/O interfaces
conference, January 1996


Evaluation of active storage strategies for the lustre parallel file system
conference, January 2007


A Cost-Effective, High Bandwidth Server I/O network Architecture for Cluster Systems
conference, March 2007


Integration Experiences and Performance Studies of A COTS Parallel Archive System
conference, September 2010


PaScal-- A New Parallel and Scalable Server IO Networking Infrastructure for Supporting Global Storage/File Systems in Large-size Linux Clusters
conference, January 2006


Flexibility, manageability, and performance in a Grid storage appliance
conference, January 2002


Distributed-and-split data-control extension to SCSI for scalable storage area networks
conference, January 2002


Pageserver: High-Performance SSD-Based Checkpointing of Transactional Distributed Memory
conference, March 2010


Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
journal, July 2011


Evaluating the benefits of an extended memory hierarchy for parallel streamline algorithms
conference, October 2011


Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
conference, June 2011


Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
conference, November 2010

  • Li, Min; Vazhkudai, Sudharshan S.; Butt, Ali R.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.28

Managing storage space in a flash and disk hybrid storage system
conference, September 2009

  • Xiaojian Wu, ; Reddy, A. L. N.
  • amp; Simulation of Computer and Telecommunication Systems (MASCOTS), 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems
  • https://doi.org/10.1109/MASCOT.2009.5366764

Exploiting Concurrency to Improve Latency and throughput in a Hybrid Storage System
conference, August 2010

  • Wu, Xiaojian; Reddy, A. L. Narasimha
  • Simulation of Computer and Telecommunication Systems (MASCOTS), 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
  • https://doi.org/10.1109/MASCOTS.2010.11

Umbrella file system: Storage management across heterogeneous devices
journal, March 2009


Incorporating Network RAM and Flash into Fast Backing Store for Clusters
conference, September 2011


The Conquest file system : Better performance through a disk/persistent-RAM hybrid design
journal, August 2006


Azor: Using Two-Level Block Selection to Improve SSD-Based I/O Caches
conference, July 2011

  • Klonatos, Yannis; Makatos, Thanos; Marazakis, Manolis
  • 2011 6th IEEE International Conference on Networking, Architecture, and Storage (NAS), 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage
  • https://doi.org/10.1109/NAS.2011.50

Using Active NVRAM for Cloud I/O
conference, October 2011


A comprehensive study of energy efficiency and performance of flash-based SSD
journal, April 2011


Making a case for distributed file systems at Exascale
conference, January 2011


Jitter-free co-processing on a prototype exascale storage stack
conference, April 2012


Verifying Scientific Simulations via Comparative and Quantitative Visualization
journal, November 2010


Scalable I/O forwarding framework for high-performance computing systems
conference, August 2009


Design issues for a shingled write disk system
conference, May 2010


The ParaView Coprocessing Library: A scalable, general purpose in situ visualization library
conference, October 2011


Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010

  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.32

Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
conference, January 2008

  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08
  • https://doi.org/10.1145/1383529.1383533

Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010

  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.18

GIGA+: scalable directories for shared file systems
conference, January 2007

  • Patil, Swapnil V.; Gibson, Garth A.; Lang, Sam
  • Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07
  • https://doi.org/10.1145/1374596.1374604

Scalable parallel building blocks for custom data analysis
conference, October 2011


Visualization by Proxy: A Novel Framework for Deferred Interaction with Volume Data
journal, November 2010


Toward simulation-time data analysis and I/O acceleration on leadership-class systems
conference, October 2011


Parallel In Situ Coupling of Simulation with a Fully Featured Visualization System
null, January 2011


In-situ Sampling of a Large-Scale Particle Simulation for Interactive Visualization and Analysis
journal, June 2011


Remote Large Data Visualization in the ParaView Framework
null, January 2006


On the role of burst buffers in leadership-class storage systems
conference, April 2012


Modeling a Leadership-Scale Storage System
book, January 2012


Storage challenges at Los Alamos National Lab
conference, April 2012


A higher order estimate of the optimum checkpoint interval for restart dumps
journal, February 2006


A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications
conference, January 2004


Data storage system having separate data transfer section and message network
patent, October 2006


Techniques for using flash-based memory as a write cache and a vault
patent, September 2010


Efficient read/write algorithms and associated mapping for block-level data reduction processes
patent, March 2012


Techniques for using flash-based memory in recovery processing
patent, October 2012


Systems and methods for managing portions of files in multi-tier storage systems
patent, January 2013


Managing data on data storage systems
patent, March 2013


Small file aggregation in a parallel computing system
patent, September 2014


Multi-path file system with block cache between client and storage array
patent, October 2014


Storing files in a parallel computing system based on user-specified parser function
patent, October 2014


Burst buffer appliance with small file aggregation
patent, March 2015


Hybrid data storage system in an HPC exascale environment
patent, August 2015


Dynamically Controlled Checkpoint Timing
patent-application, September 2007


Simplified RDMA Over Ethernet and Fibre Channel
patent-application, July 2010


Multi-Petascale Highly Efficient Parallel Supercomputer
patent-application, September 2011