A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks
Scientific applications frequently spend a large fraction of their execution time in reading and writing data on parallel file systems. Identifying these I/O performance bottlenecks and attributing root causes are critical steps toward devising optimization strategies. Several existing studies analyze I/O logs of a set of benchmarks or applications that were run with controlled behaviors. However, there is still a lack of general approach that systematically identifies I/O performance bottlenecks for applications running "in the wild" on production systems. In this study, we have developed an analysis approach of "zooming in" from platform-wide to application-wide to job-level I/O logs for identifying I/O bottlenecks in arbitrary scientific applications. We analyze the logs collected on a Cray XC40 system in production over a two-month period. This study results in several insights for application developers to use in optimizing I/O behavior.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science - Office of Advanced Scientific Computing Research
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1562878
- Resource Relation:
- Conference: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 05/14/19 - 05/17/19, Larnaca, CY
- Country of Publication:
- United States
- Language:
- English
Similar Records
I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis
AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis