skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Journal Article · · Journal of Computer Science and Technology
 [1];  [2];  [1];  [1];  [2];  [2];  [2];  [1];  [3];  [2]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. The HDF Group, Champaign, IL (United States)
  3. Argonne National Lab. (ANL), Lemont, IL (United States)

Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. As such, parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities Division
Grant/Contract Number:
AC02-05CH11231; 17-SC-20-SC; AC02-06CH11357
OSTI ID:
1582374
Journal Information:
Journal of Computer Science and Technology, Vol. 35, Issue 1; ISSN 1000-9000
Publisher:
Springer NatureCopyright Statement
Country of Publication:
United States
Language:
English

References (9)

ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage conference December 2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework journal March 2018
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System conference December 2016
Parallel I/O, analysis, and visualization of a trillion particle simulation
  • Byna, Surendra; Chou, Jerry; Rubel, Oliver
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.92
conference November 2012
Terascale direct numerical simulations of turbulent combustion using S3D journal January 2009
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
ArrayUDF: User-Defined Scientific Data Analysis on Arrays conference January 2017
An overview of the HDF5 technology suite and its applications conference January 2011
Adaptable, metadata rich IO methods for portable high performance IO conference May 2009