skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reducing I/O variability using dynamic I/O path characterization in petascale storage systems

Abstract

In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. Furthermore, the I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We also implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. Here, we demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.

Authors:
 [1];  [2];  [3];  [4];  [3]
  1. Univ. of Massachusetts, Lowell, MA (United States)
  2. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  3. Northwestern Univ., Evanston, IL (United States)
  4. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States); Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF); USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1356839
Alternate Identifier(s):
OSTI ID: 1469016
Report Number(s):
SAND-2017-3907J; FERMILAB-PUB-17-292-CD
Journal ID: ISSN 0920-8542; PII: 1904
Grant/Contract Number:  
AC04-94AL85000; FG02-08ER25848; SC0001283; SC0005309; SC0005340; SC0007456; AC02-05CH11231; AC02-07CH11359
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Supercomputing
Additional Journal Information:
Journal Volume: 73; Journal Issue: 5; Journal ID: ISSN 0920-8542
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel I/O; I/O variability; subfile; PnetCDF

Citation Formats

Son, Seung Woo, Sehrish, Saba, Liao, Wei-keng, Oldfield, Ron, and Choudhary, Alok. Reducing I/O variability using dynamic I/O path characterization in petascale storage systems. United States: N. p., 2016. Web. doi:10.1007/s11227-016-1904-7.
Son, Seung Woo, Sehrish, Saba, Liao, Wei-keng, Oldfield, Ron, & Choudhary, Alok. Reducing I/O variability using dynamic I/O path characterization in petascale storage systems. United States. doi:10.1007/s11227-016-1904-7.
Son, Seung Woo, Sehrish, Saba, Liao, Wei-keng, Oldfield, Ron, and Choudhary, Alok. Tue . "Reducing I/O variability using dynamic I/O path characterization in petascale storage systems". United States. doi:10.1007/s11227-016-1904-7. https://www.osti.gov/servlets/purl/1356839.
@article{osti_1356839,
title = {Reducing I/O variability using dynamic I/O path characterization in petascale storage systems},
author = {Son, Seung Woo and Sehrish, Saba and Liao, Wei-keng and Oldfield, Ron and Choudhary, Alok},
abstractNote = {In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. Furthermore, the I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We also implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. Here, we demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.},
doi = {10.1007/s11227-016-1904-7},
journal = {Journal of Supercomputing},
issn = {0920-8542},
number = 5,
volume = 73,
place = {United States},
year = {2016},
month = {11}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols
conference, November 2008

  • Liao, Wei-keng; Choudhary, Alok
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2008.5222722

Characterizing output bottlenecks in a supercomputer
conference, November 2012

  • Xie, Bing; Chase, Jeffrey; Dillow, David
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.28

TRIO: Burst Buffer Based I/O Orchestration
conference, September 2015

  • Wang, Teng; Oral, Sarp; Pritchard, Michael
  • 2015 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2015.38

FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes
journal, November 2000

  • Fryxell, B.; Olson, K.; Ricker, P.
  • The Astrophysical Journal Supplement Series, Vol. 131, Issue 1
  • DOI: 10.1086/317361

Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
conference, January 2008

  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08
  • DOI: 10.1145/1383529.1383533

IO-Cop: Managing Concurrent Accesses to Shared Parallel File System
conference, September 2014

  • Thapaliya, Sagar; Bangalore, Purushotham; Lofstead, Jay
  • 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops
  • DOI: 10.1109/ICPPW.2014.20

PLFS: a checkpoint filesystem for parallel applications
conference, January 2009


IO strategies and data services for petascale data sets from a global cloud resolving model
journal, July 2007


Jitter-free co-processing on a prototype exascale storage stack
conference, April 2012

  • Bent, John; Faibish, Sorin; Ahrens, Jim
  • 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2012.6232382

Breaking the Cloud Parameterization Deadlock
journal, November 2003

  • Randall, David; Khairoutdinov, Marat; Arakawa, Akio
  • Bulletin of the American Meteorological Society, Vol. 84, Issue 11
  • DOI: 10.1175/BAMS-84-11-1547

Scalable Design and Implementations for MPI Parallel Overlapping I/O
journal, November 2006

  • Wei-keng Liao, ; Coloma, K.; Choudhary, A.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 17, Issue 11
  • DOI: 10.1109/TPDS.2006.163

I/O-aware bandwidth allocation for petascale computing systems
journal, October 2016


Server-directed collective I/O in Panda
conference, January 1995

  • Seamons, K. E.; Chen, Y.; Jones, P.
  • Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '95
  • DOI: 10.1145/224170.224371

On the role of burst buffers in leadership-class storage systems
conference, April 2012

  • Liu, Ning; Cope, Jason; Carns, Philip
  • 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2012.6232369

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems
conference, November 2014

  • Dai, Dong; Chen, Yong; Kimpe, Dries
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.57

Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment
conference, January 2009

  • Dickens, Phillip M.; Logan, Jeremy
  • Proceedings of the 18th ACM international symposium on High performance distributed computing - HPDC '09
  • DOI: 10.1145/1551609.1551617

A case study for scientific I/O: improving the FLASH astrophysics code
journal, January 2012


BurstMem: A high-performance burst buffer system for scientific applications
conference, October 2014


Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing
conference, August 2016

  • Tavakoli, Neda; Dai, Dong; Chen, Yong
  • 2016 45th International Conference on Parallel Processing Workshops (ICPPW)
  • DOI: 10.1109/ICPPW.2016.38

How Much SSD Is Useful for Resilience in Supercomputers
conference, January 2015

  • Fang, Aiman; Chien, Andrew A.
  • Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale - FTXS '15
  • DOI: 10.1145/2751504.2751509

Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010

  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.32

Managing I/O Interference in a Shared Burst Buffer System
conference, August 2016

  • Thapaliya, Sagar; Bangalore, Purushotham; Lofstead, Jat
  • 2016 45th International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2016.54

Massively Parallel i/o for Partitioned Solver Systems
journal, December 2010

  • Liu, Ning; Fu, Jing; Carothers, Christopher D.
  • Parallel Processing Letters, Vol. 20, Issue 04
  • DOI: 10.1142/s0129626410000302

Data sieving and collective I/O in ROMIO
conference, January 1999

  • Thakur, R.; Gropp, W.; Lusk, E.
  • Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation
  • DOI: 10.1109/FMPC.1999.750599

Improved parallel I/O via a two-phase run-time access strategy
journal, December 1993

  • del Rosario, Juan Miguel; Bordawekar, Rajesh; Choudhary, Alok
  • ACM SIGARCH Computer Architecture News, Vol. 21, Issue 5
  • DOI: 10.1145/165660.165667

ASCAR: Automating contention management for high-performance storage systems
conference, May 2015

  • Li, Yan; Lu, Xiaoyuan; Miller, Ethan L.
  • 2015 31st Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2015.7208287

Server-side I/O coordination for parallel file systems
conference, January 2011

  • Song, Huaiming; Yin, Yanlong; Sun, Xian-He
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063407

Parallel I/O Performance for Application-Level Checkpointing on the Blue Gene/P System
conference, September 2011

  • Fu, Jing; Min, Misun; Latham, Robert
  • 2011 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2011.81

QoS support for end users of I/O-intensive applications using shared storage systems
conference, January 2011

  • Zhang, Xuechen; Davis, Kei; Jiang, Song
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063408

Efficient data restructuring and aggregation for I/O acceleration in PIDX
conference, November 2012

  • Kumar, Sidharth; Vishwanath, Venkatram; Carns, Philip
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/sc.2012.54

Toward a General I/O Layer for Parallel-Visualization Applications
journal, November 2011

  • Kendall, W.; Peterka, T.
  • IEEE Computer Graphics and Applications, Vol. 31, Issue 6
  • DOI: 10.1109/MCG.2011.102

Exploiting Lustre File Joining for Effective Collective IO
conference, May 2007

  • Yu, Weikuan; Vetter, Jeffrey; Canon, R. Shane
  • Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)
  • DOI: 10.1109/CCGRID.2007.51

The Tau Parallel Performance System
journal, May 2006

  • Shende, Sameer S.; Malony, Allen D.
  • The International Journal of High Performance Computing Applications, Vol. 20, Issue 2
  • DOI: 10.1177/1094342006064482

ParColl: Partitioned Collective I/O on the Cray XT
conference, September 2008

  • Yu, Weikuan; Vetter, Jeffrey
  • 2008 37th International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2008.76

Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O
conference, September 2009

  • Gao, Kui; Liao, Wei-keng; Nisar, Arifa
  • 2009 International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2009.68

Disk-directed I/O for MIMD multiprocessors
journal, February 1997


On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
conference, May 2016

  • Yildiz, Orcun; Dorier, Matthieu; Ibrahim, Shadi
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.50

Direct numerical simulations of turbulent lean premixed combustion
journal, September 2006

  • Sankaran, Ramanan; Hawkes, Evatt R.; Chen, Jacqueline H.
  • Journal of Physics: Conference Series, Vol. 46
  • DOI: 10.1088/1742-6596/46/1/004

I/O performance challenges at leadership scale
conference, January 2009

  • Lang, Samuel; Carns, Philip; Latham, Robert
  • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
  • DOI: 10.1145/1654059.1654100

PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets
conference, September 2011

  • Kumar, Sidharth; Vishwanath, Venkatram; Carns, Philip
  • 2011 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2011.19

Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003

  • Li, Jianwei; Zingale, Michael; Liao, Wei-keng
  • Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
  • DOI: 10.1145/1048935.1050189

Insights for exascale IO APIs from building a petascale IO API
conference, January 2013

  • Lofstead, Jay; Ross, Robert
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503238

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
conference, May 2014

  • Dorier, Matthieu; Antoniu, Gabriel; Ross, Rob
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.27

Dynamic file striping and data layout transformation on parallel system with fluctuating I/O workload
conference, September 2013

  • Son, Seung Woo; Sehrish, Saba; Liao, Wei-keng
  • 2013 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2013.6702689

24/7 Characterization of petascale I/O workloads
conference, August 2009

  • Carns, Philip; Latham, Robert; Ross, Robert
  • 2009 IEEE International Conference on Cluster Computing and Workshops
  • DOI: 10.1109/CLUSTR.2009.5289150

A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers
conference, May 2014

  • Sato, Kento; Mohror, Kathryn; Moody, Adam
  • 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
  • DOI: 10.1109/CCGrid.2014.24