skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks

Abstract

Scientific applications frequently spend a large fraction of their execution time in reading and writing data on parallel file systems. Identifying these I/O performance bottlenecks and attributing root causes are critical steps toward devising optimization strategies. Several existing studies analyze I/O logs of a set of benchmarks or applications that were run with controlled behaviors. However, there is still a lack of general approach that systematically identifies I/O performance bottlenecks for applications running "in the wild" on production systems. In this study, we have developed an analysis approach of "zooming in" from platform-wide to application-wide to job-level I/O logs for identifying I/O bottlenecks in arbitrary scientific applications. We analyze the logs collected on a Cray XC40 system in production over a two-month period. This study results in several insights for application developers to use in optimizing I/O behavior.

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science - Office of Advanced Scientific Computing Research
OSTI Identifier:
1562878
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 05/14/19 - 05/17/19, Larnaca, CY
Country of Publication:
United States
Language:
English

Citation Formats

Wang, Teng, Byna, Suren, Lockwood, Glenn K., Snyder, Shane, Carns, Philip, Kim, Sunggon, and Wright, Nicholas J. A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks. United States: N. p., 2019. Web. doi:10.1109/CCGRID.2019.00021.
Wang, Teng, Byna, Suren, Lockwood, Glenn K., Snyder, Shane, Carns, Philip, Kim, Sunggon, & Wright, Nicholas J. A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks. United States. doi:10.1109/CCGRID.2019.00021.
Wang, Teng, Byna, Suren, Lockwood, Glenn K., Snyder, Shane, Carns, Philip, Kim, Sunggon, and Wright, Nicholas J. Tue . "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks". United States. doi:10.1109/CCGRID.2019.00021.
@article{osti_1562878,
title = {A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks},
author = {Wang, Teng and Byna, Suren and Lockwood, Glenn K. and Snyder, Shane and Carns, Philip and Kim, Sunggon and Wright, Nicholas J.},
abstractNote = {Scientific applications frequently spend a large fraction of their execution time in reading and writing data on parallel file systems. Identifying these I/O performance bottlenecks and attributing root causes are critical steps toward devising optimization strategies. Several existing studies analyze I/O logs of a set of benchmarks or applications that were run with controlled behaviors. However, there is still a lack of general approach that systematically identifies I/O performance bottlenecks for applications running "in the wild" on production systems. In this study, we have developed an analysis approach of "zooming in" from platform-wide to application-wide to job-level I/O logs for identifying I/O bottlenecks in arbitrary scientific applications. We analyze the logs collected on a Cray XC40 system in production over a two-month period. This study results in several insights for application developers to use in optimizing I/O behavior.},
doi = {10.1109/CCGRID.2019.00021},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: