skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Demystifying asynchronous I/O Interference in HPC applications

Abstract

With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, here we investigate the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing backgroundmore » I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.« less

Authors:
 [1]; ORCiD logo [2];  [2];  [1]
  1. Univ. of California, Irvine, CA (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1831116
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 35; Journal Issue: 4; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; I/O interference; asynchronous and concurrent I/O; checkpointing; HPC applications; performance analysis

Citation Formats

Tseng, Shu-Mei, Nicolae, Bogdan, Cappello, Franck, and Chandramowlishwaran, Aparna. Demystifying asynchronous I/O Interference in HPC applications. United States: N. p., 2021. Web. doi:10.1177/10943420211016511.
Tseng, Shu-Mei, Nicolae, Bogdan, Cappello, Franck, & Chandramowlishwaran, Aparna. Demystifying asynchronous I/O Interference in HPC applications. United States. https://doi.org/10.1177/10943420211016511
Tseng, Shu-Mei, Nicolae, Bogdan, Cappello, Franck, and Chandramowlishwaran, Aparna. 2021. "Demystifying asynchronous I/O Interference in HPC applications". United States. https://doi.org/10.1177/10943420211016511. https://www.osti.gov/servlets/purl/1831116.
@article{osti_1831116,
title = {Demystifying asynchronous I/O Interference in HPC applications},
author = {Tseng, Shu-Mei and Nicolae, Bogdan and Cappello, Franck and Chandramowlishwaran, Aparna},
abstractNote = {With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, here we investigate the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing background I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.},
doi = {10.1177/10943420211016511},
url = {https://www.osti.gov/biblio/1831116}, journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = 4,
volume = 35,
place = {United States},
year = {2021},
month = {5}
}

Works referenced in this record:

OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES
journal, June 2011


HACC: extreme scaling and performance across diverse architectures
conference, January 2013

  • Habib, Salman; Morozov, Vitali; Frontiere, Nicholas
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • https://doi.org/10.1145/2503210.2504566

Toward Scalable and Asynchronous Object-Centric Data Management for HPC
conference, May 2018


GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution
conference, January 2013

  • Zheng, Fang; Yu, Hongfeng; Hantas, Can
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • https://doi.org/10.1145/2503210.2503279

Understanding and Improving Computational Science Storage Access through Continuous Characterization
journal, October 2011


Improving collective I/O performance using threads
conference, January 1999

  • Dickens, P. M.; Thakur, R.
  • Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999
  • https://doi.org/10.1109/IPPS.1999.760432

Exascale computing and big data
journal, June 2015


Reducing I/O variability using dynamic I/O path characterization in petascale storage systems
journal, November 2016


Understanding the Effects of Communication and Coordination on Checkpointing at Scale
conference, November 2014

  • Ferreira, Kurt B.; Widener, Patrick; Levy, Scott
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2014.77

Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010

  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.32

Light-weight parallel Python tools for earth system modeling workflows
conference, October 2015


CHARM++: a portable concurrent object oriented system based on C++
conference, January 1993

  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93
  • https://doi.org/10.1145/165854.165874

Rucio: Scientific Data Management
journal, August 2019


InterferenceRemoval: removing interference of disk access for MPI programs through data replication
conference, January 2010


Towards Asynchronous Many-Task in Situ Data Analysis Using Legion
conference, May 2016


Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms
conference, May 2018


I/O-Aware Batch Scheduling for Petascale Computing Systems
conference, September 2015


Tuning Object-Centric Data Management Systems for Large Scale Scientific Applications
conference, December 2019


On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
conference, May 2016


StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010


Harnessing Data Movement in Virtual Clusters for In-Situ Execution
journal, March 2019


Scheduling the I/O of HPC Applications Under Congestion
conference, May 2015


VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale
conference, May 2019


Enterprise HPC storage systems
conference, September 2014


NiMC: Characterizing and Eliminating Network-Induced Memory Contention
conference, May 2016


Storage challenges at Los Alamos National Lab
conference, April 2012


Transferring a petabyte in a day
journal, November 2018


DAOS and Friends: A Proposal for an Exascale Storage System
conference, November 2016

  • Lofstead, Jay; Jimenez, Ivo; Maltzahn, Carlos
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2016.49