skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System

Abstract

With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1474694
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) - Atlanta, Georgia, United States of America - 6/5/2017 8:00:00 AM-8/8/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Wan, Lipeng, Wolf, Matthew D., Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, and Klasky, Scott A. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System. United States: N. p., 2017. Web. doi:10.1109/ICDCS.2017.257.
Wan, Lipeng, Wolf, Matthew D., Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, & Klasky, Scott A. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System. United States. doi:10.1109/ICDCS.2017.257.
Wan, Lipeng, Wolf, Matthew D., Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, and Klasky, Scott A. Thu . "Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System". United States. doi:10.1109/ICDCS.2017.257. https://www.osti.gov/servlets/purl/1474694.
@article{osti_1474694,
title = {Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System},
author = {Wan, Lipeng and Wolf, Matthew D. and Wang, Feiyi and Choi, Jong Youl and Ostrouchov, George and Klasky, Scott A.},
abstractNote = {With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.},
doi = {10.1109/ICDCS.2017.257},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {6}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: