Demystifying asynchronous I/O Interference in HPC applications
Abstract
With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, here we investigate the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing backgroundmore »
- Authors:
-
- Univ. of California, Irvine, CA (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1831116
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- International Journal of High Performance Computing Applications
- Additional Journal Information:
- Journal Volume: 35; Journal Issue: 4; Journal ID: ISSN 1094-3420
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; I/O interference; asynchronous and concurrent I/O; checkpointing; HPC applications; performance analysis
Citation Formats
Tseng, Shu-Mei, Nicolae, Bogdan, Cappello, Franck, and Chandramowlishwaran, Aparna. Demystifying asynchronous I/O Interference in HPC applications. United States: N. p., 2021.
Web. doi:10.1177/10943420211016511.
Tseng, Shu-Mei, Nicolae, Bogdan, Cappello, Franck, & Chandramowlishwaran, Aparna. Demystifying asynchronous I/O Interference in HPC applications. United States. https://doi.org/10.1177/10943420211016511
Tseng, Shu-Mei, Nicolae, Bogdan, Cappello, Franck, and Chandramowlishwaran, Aparna. 2021.
"Demystifying asynchronous I/O Interference in HPC applications". United States. https://doi.org/10.1177/10943420211016511. https://www.osti.gov/servlets/purl/1831116.
@article{osti_1831116,
title = {Demystifying asynchronous I/O Interference in HPC applications},
author = {Tseng, Shu-Mei and Nicolae, Bogdan and Cappello, Franck and Chandramowlishwaran, Aparna},
abstractNote = {With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, here we investigate the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing background I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.},
doi = {10.1177/10943420211016511},
url = {https://www.osti.gov/biblio/1831116},
journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = 4,
volume = 35,
place = {United States},
year = {2021},
month = {5}
}
Works referenced in this record:
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES
journal, June 2011
- Duran, Alejandro; AyguadÉ, Eduard; Badia, Rosa M.
- Parallel Processing Letters, Vol. 21, Issue 02
HACC: extreme scaling and performance across diverse architectures
conference, January 2013
- Habib, Salman; Morozov, Vitali; Frontiere, Nicholas
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
Toward Scalable and Asynchronous Object-Centric Data Management for HPC
conference, May 2018
- Tang, Houjun; Byna, Suren; Tessier, Francois
- 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution
conference, January 2013
- Zheng, Fang; Yu, Hongfeng; Hantas, Can
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
Understanding and Improving Computational Science Storage Access through Continuous Characterization
journal, October 2011
- Carns, Philip; Harms, Kevin; Allcock, William
- ACM Transactions on Storage, Vol. 7, Issue 3, p. 1-26
Improving collective I/O performance using threads
conference, January 1999
- Dickens, P. M.; Thakur, R.
- Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999
Exascale computing and big data
journal, June 2015
- Reed, Daniel A.; Dongarra, Jack
- Communications of the ACM, Vol. 58, Issue 7
Reducing I/O variability using dynamic I/O path characterization in petascale storage systems
journal, November 2016
- Son, Seung Woo; Sehrish, Saba; Liao, Wei-keng
- The Journal of Supercomputing, Vol. 73, Issue 5, p. 2069-2097
Understanding the Effects of Communication and Coordination on Checkpointing at Scale
conference, November 2014
- Ferreira, Kurt B.; Widener, Patrick; Levy, Scott
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010
- Lofstead, Jay; Zheng, Fang; Liu, Qing
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Light-weight parallel Python tools for earth system modeling workflows
conference, October 2015
- Paul, Kevin; Mickelson, Sheri; Dennis, John M.
- 2015 IEEE International Conference on Big Data (Big Data)
CHARM++: a portable concurrent object oriented system based on C++
conference, January 1993
- Kale, Laxmikant V.; Krishnan, Sanjeev
- Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93
Rucio: Scientific Data Management
journal, August 2019
- Barisits, Martin; Beermann, Thomas; Berghaus, Frank
- Computing and Software for Big Science, Vol. 3, Issue 1
InterferenceRemoval: removing interference of disk access for MPI programs through data replication
conference, January 2010
- Zhang, Xuechen; Jiang, Song
- Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
Towards Asynchronous Many-Task in Situ Data Analysis Using Legion
conference, May 2016
- Pebay, Philippe; Bennett, Janine C.; Hollman, David
- 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms
conference, May 2018
- Herault, Thomas; Robert, Yves; Bouteiller, Aurelien
- 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
I/O-Aware Batch Scheduling for Petascale Computing Systems
conference, September 2015
- Zhou, Zhou; Yang, Xu; Zhao, Dongfang
- 2015 IEEE International Conference on Cluster Computing (CLUSTER)
Tuning Object-Centric Data Management Systems for Large Scale Scientific Applications
conference, December 2019
- Tang, Houjun; Byna, Suren; Bailey, Stephen
- 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)
On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
conference, May 2016
- Yildiz, Orcun; Dorier, Matthieu; Ibrahim, Shadi
- 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010
- Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
- Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
Harnessing Data Movement in Virtual Clusters for In-Situ Execution
journal, March 2019
- Huang, Dan; Liu, Qing; Klasky, Scott
- IEEE Transactions on Parallel and Distributed Systems, Vol. 30, Issue 3
Scheduling the I/O of HPC Applications Under Congestion
conference, May 2015
- Gainaru, Ana; Aupy, Guillaume; Benoit, Anne
- 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale
conference, May 2019
- Nicolae, Bogdan; Moody, Adam; Gonsiorowski, Elsa
- 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Enterprise HPC storage systems
conference, September 2014
- Petersen, Torben Kling; Fragalla, John
- 2014 IEEE High Performance Extreme Computing Conference (HPEC)
NiMC: Characterizing and Eliminating Network-Induced Memory Contention
conference, May 2016
- Groves, Taylor; Grant, Ryan E.; Arnold, Dorian
- 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Storage challenges at Los Alamos National Lab
conference, April 2012
- Bent, John; Grider, Gary; Kettering, Brett
Transferring a petabyte in a day
journal, November 2018
- Kettimuthu, Rajkumar; Liu, Zhengchun; Wheeler, David
- Future Generation Computer Systems, Vol. 88
DAOS and Friends: A Proposal for an Exascale Storage System
conference, November 2016
- Lofstead, Jay; Jimenez, Ivo; Maltzahn, Carlos
- SC16: International Conference for High Performance Computing, Networking, Storage and Analysis