Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Interfacing HDF5 with a scalable object‐centric storage system on hierarchical storage

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5715· OSTI ID:1603709
Summary

Object storage technologies that take advantage of multitier storage on HPC systems are emerging. However, to use these technologies at present, applications have to be modified significantly from current I/O libraries. HDF5, a widely used I/O middleware on HPC systems, provides a virtual object layer (VOL) that allows applications to connect to different storage mechanisms transparently without requiring significant code modifications. We recently designed the proactive data containers (PDC) object‐centric storage system that provides the capabilities of transparent, asynchronous, and autonomous data movement taking advantage of multiple storage tiers—a decision that has so far been left upon the user on most current systems. To enable PDC's features through HDF5 without modifying application codes, we have developed an HDF5 VOL connector that interfaces with PDC. We present in this article the connector interface and evaluate its performance on Cori, a Cray XC40 supercomputer located at the National Energy Research Scientific Computing Center (NERSC). Our evaluation demonstrates up to an 8× improvement compared with HDF5 that has the most recent optimizations.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-05CH11231; SC0016454
OSTI ID:
1603709
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 20 Vol. 32; ISSN 1532-0626
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
United Kingdom
Language:
English

References (25)

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS journal August 2013
Persistent object management system journal January 1984
A generic persistent object store journal January 1992
Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation journal May 2008
Toward Scalable and Asynchronous Object-Centric Data Management for HPC conference May 2018
Improving I/O Forwarding Throughput with Data Compression conference September 2011
Mercury: Enabling remote procedure call for high-performance computing conference September 2013
TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers conference September 2017
UniviStor: Integrated Hierarchical and Distributed Storage for HPC conference September 2018
A Transparent Server-Managed Object Storage System for HPC conference September 2018
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System conference December 2016
Searching for millions of objects in the BOSS spectroscopic survey data with H5Boss conference August 2017
Entering the petaflop era: The architecture and performance of Roadrunner conference November 2008
A Plugin for HDF5 Using PLFS for Improved I/O Performance and Semantic Analysis conference November 2012
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters
  • Weil, Sage A.; Leung, Andrew W.; Brandt, Scott A.
  • Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07 https://doi.org/10.1145/1374596.1374606
conference January 2007
An overview of the HDF5 technology suite and its applications conference January 2011
The POSIX family of standards journal March 1995
Improving parallel I/O autotuning with performance modeling
  • Behzad, Babak; Byna, Surendra; Wild, Stefan M.
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600708
conference January 2014
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems journal April 2014
BD-CATS: big data clustering at trillion particle scale
  • Patwary, Md. Mostofa Ali; Dubey, Pradeep; Byna, Suren
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807616
conference January 2015
A cost-effective, high-bandwidth storage architecture journal November 1998
On implementing MPI-IO portably and with high performance conference January 1999
SSDUP: a traffic-aware ssd burst buffer for HPC systems conference January 2017
Design of the Mneme persistent object store journal April 1990

Similar Records

PDC VOL Connector (PDCVOL) v0.1
Software · Wed Mar 02 19:00:00 EST 2022 · OSTI ID:code-110826

Tuning HDF5 subfiling performance on parallel file systems
Conference · Fri May 12 00:00:00 EDT 2017 · OSTI ID:1398484

Proactive Data Containers for Scientific Storage (Final Report)
Technical Report · Mon Dec 09 23:00:00 EST 2019 · OSTI ID:1577855

Related Subjects