skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Taming parallel I/O complexity with auto-tuning

Abstract

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, and 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. In conclusion, we consistently demonstrate I/O write speedups between 2x and 100x for test configurations.

Authors:
 [1];  [1];  [2];  [3];  [3];  [4];  [4];  [1]
  1. Univ. of Illinois, Urbana-Champaign, IL (United States)
  2. Rice Univ., Houston, TX (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  4. The HDF Group, Champaign, IL (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
Computational Research Division; USDOE
OSTI Identifier:
1311633
Report Number(s):
LBNL-1005953
Journal ID: ISSN 1063-9635; ir:1005953
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Proceedings of the ACM/IEEE Supercomputing Conference
Additional Journal Information:
Journal Volume: 2013; Conference: SC13-International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO (United States), 17-22 Nov 2013; Journal ID: ISSN 1063-9635
Publisher:
ACM/IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel I/O; auto-tuning; performance optimization; parallel file systems

Citation Formats

Behzad, Babak, Luu, Huong Vu Thanh, Huchette, Joseph, Byna, Surendra, Prabhat, -, Aydt, Ruth, Koziol, Quincey, and Snir, Marc. Taming parallel I/O complexity with auto-tuning. United States: N. p., 2013. Web. doi:10.1145/2503210.2503278.
Behzad, Babak, Luu, Huong Vu Thanh, Huchette, Joseph, Byna, Surendra, Prabhat, -, Aydt, Ruth, Koziol, Quincey, & Snir, Marc. Taming parallel I/O complexity with auto-tuning. United States. https://doi.org/10.1145/2503210.2503278
Behzad, Babak, Luu, Huong Vu Thanh, Huchette, Joseph, Byna, Surendra, Prabhat, -, Aydt, Ruth, Koziol, Quincey, and Snir, Marc. 2013. "Taming parallel I/O complexity with auto-tuning". United States. https://doi.org/10.1145/2503210.2503278. https://www.osti.gov/servlets/purl/1311633.
@article{osti_1311633,
title = {Taming parallel I/O complexity with auto-tuning},
author = {Behzad, Babak and Luu, Huong Vu Thanh and Huchette, Joseph and Byna, Surendra and Prabhat, - and Aydt, Ruth and Koziol, Quincey and Snir, Marc},
abstractNote = {We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, and 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. In conclusion, we consistently demonstrate I/O write speedups between 2x and 100x for test configurations.},
doi = {10.1145/2503210.2503278},
url = {https://www.osti.gov/biblio/1311633}, journal = {Proceedings of the ACM/IEEE Supercomputing Conference},
issn = {1063-9635},
number = ,
volume = 2013,
place = {United States},
year = {Sun Nov 17 00:00:00 EST 2013},
month = {Sun Nov 17 00:00:00 EST 2013}
}