I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis

Bez, Jean Luca; Tang, Houjun; Xie, Bing; Williams young, David; Latham, Rob; Ross, Rob; Oral, Sarp; Byna, Suren

doi:10.1109/PDSW54622.2021.00008

I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis

Conference · Mon Nov 01 04:00:00 EDT 2021

DOI:https://doi.org/10.1109/PDSW54622.2021.00008· OSTI ID:1871120

Bez, Jean Luca ^[1]; Tang, Houjun ^[2]; Xie, Bing ^[3]; Williams young, David ^[1]; Latham, Rob ^[4]; Ross, Rob ^[4]; ^[3]; Byna, Suren ^[2]

Lawrence Berkeley National Laboratory (LBNL)
Lawrence Berkeley Laboratory, CA
ORNL
Argonne National Laboratory (ANL)

Using parallel file systems efficiently is a tricky problem due to inter-dependencies among multiple layers of I/O software, including high-level I/O libraries (HDF5, netCDF, etc.), MPI-IO, POSIX, and file systems (GPFS, Lustre, etc.). Profiling tools such as Darshan collect traces to help understand the I/O performance behavior. However, there are significant gaps in analyzing the collected traces and then applying tuning options offered by various layers of I/O software. Seeking to connect the dots between I/O bottleneck detection and tuning, we propose DXT Explorer, an interactive log analysis tool. In this paper, we present a case study using our interactive log analysis tool to identify and apply various I/O optimizations. We report an evaluation of performance improvement achieved for four I/O kernels extracted from science applications.

View Conference

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1871120

Country of Publication:: United States

Language:: English

References (17)

MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS Prost, Jean-Pierre; Treumann, Richard; Hedges, Richard Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01 https://doi.org/10.1145/582034.582051	conference	January 2001
Revisiting I/O behavior in large-scale storage systems Patel, Tirthak; Byna, Suren; Lockwood, Glenn K. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356183	conference	November 2019
Six degrees of scientific data: reading patterns for extreme scale science IO Lofstead, Jay; Polte, Milo; Gibson, Garth Proceedings of the 20th international symposium on High performance distributed computing - HPDC '11 https://doi.org/10.1145/1996130.1996139	conference	January 2011
Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance Agarwal, Megha; Singhvi, Divyansh; Malakar, Preeti 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) https://doi.org/10.1109/PDSW49588.2019.00007	conference	November 2019
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science Marek, A.; Blum, V.; Johanni, R. Journal of Physics: Condensed Matter, Vol. 26, Issue 21 https://doi.org/10.1088/0953-8984/26/21/213201	journal	May 2014
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08 https://doi.org/10.1145/1383529.1383533	conference	January 2008
The Software development process of FLASH, a multiphysics simulation code Dubey, Anshu; Antypas, Katie; Calder, Alan 2013 5th International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE) https://doi.org/10.1109/SECSE.2013.6615093	conference	May 2013
Recorder 2.0: Efficient Parallel I/O Tracing and Analysis Wang, Chen; Sun, Jinghan; Snir, Marc 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW50202.2020.00176	conference	May 2020
A User-Friendly Approach for Tuning Parallel File Operations McLay, Robert; James, Doug; Liu, Si SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.24	conference	November 2014
Foundations of JSON Schema Pezoa, Felipe; Reutter, Juan L.; Suarez, Fernando WWW '16: 25th International World Wide Web Conference, Proceedings of the 25th International Conference on World Wide Web https://doi.org/10.1145/2872427.2883029	conference	April 2016
CAPES: unsupervised storage performance tuning using neural network-based deep reinforcement learning Li, Yan; Chang, Kenneth; Bel, Oceane Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126951	conference	January 2017
Taming parallel I/O complexity with auto-tuning Behzad, Babak; Luu, Huong Vu Thanh; Huchette, Joseph SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2503210.2503278	conference	November 2013
ScaLAPACK Users' Guide Blackford, L. S.; Choi, J.; Cleary, A. Society for Industrial and Applied Mathematics https://doi.org/10.1137/1.9780898719642	book	January 1997
Data sieving and collective I/O in ROMIO Thakur, R.; Gropp, W.; Lusk, E. Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation https://doi.org/10.1109/FMPC.1999.750599	conference	January 1999
Understanding and improving computational science storage access through continuous characterization Carns, Philip; Harms, Kevin; Allcock, William 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST) https://doi.org/10.1109/MSST.2011.5937212	conference	May 2011
Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load Xie, Bing; Tang, Houjun; Byna, Suren 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid) https://doi.org/10.1109/CCGrid51090.2021.00015	conference	May 2021
ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems Byna, Suren; Breitenfeld, M. Scot; Dong, Bin Journal of Computer Science and Technology, Vol. 35, Issue 1 https://doi.org/10.1007/s11390-020-9822-9	journal	January 2020

Similar Records

I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis

Conference · Mon Nov 01 00:00:00 EDT 2021 · 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW) · OSTI ID:1959023

Scalable I/O Tracing and Analysis

Conference · Wed Dec 31 23:00:00 EST 2008 · OSTI ID:986831

High Performance Computing Application I/O Traces

Dataset · Sat Jun 06 00:00:00 EDT 2020 · OSTI ID:1785979

I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis

Citation Formats

References (17)

Similar Records

Related Subjects