skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tuning HDF5 subfiling performance on parallel file systems

Abstract

Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate and tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discussmore » recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.« less

Authors:
 [1];  [2];  [1];  [3];  [3]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Intel Corp. (United States)
  3. The HDF Group (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1398484
DOE Contract Number:
AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: Cray User Group Meeting, Redmond, WA (United States), 8-11 May 2017
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Byna, Suren, Chaarawi, Mohamad, Koziol, Quincey, Mainzer, John, and Willmore, Frank. Tuning HDF5 subfiling performance on parallel file systems. United States: N. p., 2017. Web.
Byna, Suren, Chaarawi, Mohamad, Koziol, Quincey, Mainzer, John, & Willmore, Frank. Tuning HDF5 subfiling performance on parallel file systems. United States.
Byna, Suren, Chaarawi, Mohamad, Koziol, Quincey, Mainzer, John, and Willmore, Frank. Fri . "Tuning HDF5 subfiling performance on parallel file systems". United States. doi:. https://www.osti.gov/servlets/purl/1398484.
@article{osti_1398484,
title = {Tuning HDF5 subfiling performance on parallel file systems},
author = {Byna, Suren and Chaarawi, Mohamad and Koziol, Quincey and Mainzer, John and Willmore, Frank},
abstractNote = {Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate and tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Fri May 12 00:00:00 EDT 2017},
month = {Fri May 12 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • HDF5 is a cross-platform parallel I/O library that is used by a wide variety of HPC applications for the flexibility of its hierarchical object-database representation of scientific data. We describe our recent work to optimize the performance of the HDF5 and MPI-IO libraries for the Lustre parallel file system. We selected three different HPC applications to represent the diverse range of I/O requirements, and measured their performance on three different systems to demonstrate the robustness of our optimizations across different file system configurations and to validate our optimization strategy. We demonstrate that the combined optimizations improve HDF5 parallel I/O performancemore » by up to 33 times in some cases running close to the achievable peak performance of the underlying file system and demonstrate scalable performance up to 40,960-way concurrency.« less
  • Collective I/O is a widely used technique to improve I/O performance in parallel computing. It can be implemented as a client-based or server-based scheme. The client-based implementation is more widely adopted in MPI-IO software such as ROMIO because of its independence from the storage system configuration and its greater portability. However, existing implementations of client-side collective I/O do not take into account the actual pattern offile striping over multiple I/O nodes in the storage system. This can cause a significant number of requests for non-sequential data at I/O nodes, substantially degrading I/O performance. Investigating the surprisingly high I/O throughput achievedmore » when there is an accidental match between a particular request pattern and the data striping pattern on the I/O nodes, we reveal the resonance phenomenon as the cause. Exploiting readily available information on data striping from the metadata server in popular file systems such as PVFS2 and Lustre, we design a new collective I/O implementation technique, resonant I/O, that makes resonance a common case. Resonant I/O rearranges requests from multiple MPI processes to transform non-sequential data accesses on I/O nodes into sequential accesses, significantly improving I/O performance without compromising the independence ofa client-based implementation. We have implemented our design in ROMIO. Our experimental results show that the scheme can increase I/O throughput for some commonly used parallel I/O benchmarks such as mpi-io-test and ior-mpi-io over the existing implementation of ROMIO by up to 157%, with no scenario demonstrating significantly decreased performance.« less
  • The high-performance parallel input/output subsystems provided on the Intel iPSC/2 and the NCUBE/ten are compared and contrasted. Both the hardware and software architectures of the two subsystems differ substantially. Preliminary performance measurements on the two systems are reported. 4 refs., 8 refs.
  • Parallel I/O plays an increasingly important role in today's data intensive computing applications. While much attention has been paid to parallel read performance, most of this work has focused on the parallel file system, middleware, or application layers, ignoring the potential for improvement through more effective use of local storage. In this paper, we present the design and implementation of segment-structured on-disk data grouping and prefetching (SOGP), a technique that leverages additional local storage to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. Parallel virtual file system (PVFS) ismore » chosen as an example. Our experiments show that an SOGP-enhanced PVFS prototype system can outperform a traditional Linux-Ext3-based PVFS for many applications and benchmarks, in some tests by as much as 230% in terms of I/O bandwidth.« less