skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Demystifying asynchronous I/O Interference in HPC applications

Journal Article · · International Journal of High Performance Computing Applications

With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, here we investigate the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing background I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1831116
Journal Information:
International Journal of High Performance Computing Applications, Vol. 35, Issue 4; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (28)

OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES journal June 2011
HACC: extreme scaling and performance across diverse architectures
  • Habib, Salman; Morozov, Vitali; Frontiere, Nicholas
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2504566
conference January 2013
Toward Scalable and Asynchronous Object-Centric Data Management for HPC conference May 2018
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution
  • Zheng, Fang; Yu, Hongfeng; Hantas, Can
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503279
conference January 2013
Understanding and Improving Computational Science Storage Access through Continuous Characterization journal October 2011
Improving collective I/O performance using threads
  • Dickens, P. M.; Thakur, R.
  • Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999 https://doi.org/10.1109/IPPS.1999.760432
conference January 1999
Exascale computing and big data journal June 2015
Reducing I/O variability using dynamic I/O path characterization in petascale storage systems journal November 2016
Understanding the Effects of Communication and Coordination on Checkpointing at Scale
  • Ferreira, Kurt B.; Widener, Patrick; Levy, Scott
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.77
conference November 2014
Managing Variability in the IO Performance of Petascale Storage Systems
  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.32
conference November 2010
Light-weight parallel Python tools for earth system modeling workflows conference October 2015
CHARM++: a portable concurrent object oriented system based on C++
  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93 https://doi.org/10.1145/165854.165874
conference January 1993
Rucio: Scientific Data Management journal August 2019
InterferenceRemoval: removing interference of disk access for MPI programs through data replication conference January 2010
Towards Asynchronous Many-Task in Situ Data Analysis Using Legion conference May 2016
Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms conference May 2018
I/O-Aware Batch Scheduling for Petascale Computing Systems conference September 2015
Tuning Object-Centric Data Management Systems for Large Scale Scientific Applications conference December 2019
On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems conference May 2016
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
Harnessing Data Movement in Virtual Clusters for In-Situ Execution journal March 2019
Scheduling the I/O of HPC Applications Under Congestion conference May 2015
VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale conference May 2019
Enterprise HPC storage systems conference September 2014
NiMC: Characterizing and Eliminating Network-Induced Memory Contention conference May 2016
Storage challenges at Los Alamos National Lab conference April 2012
Transferring a petabyte in a day journal November 2018
DAOS and Friends: A Proposal for an Exascale Storage System
  • Lofstead, Jay; Jimenez, Ivo; Maltzahn, Carlos
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.49
conference November 2016