skip to main content

DOE PAGESDOE PAGES

Title: Reducing I/O variability using dynamic I/O path characterization in petascale storage systems

In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. Furthermore, the I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We also implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. Here, we demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.
Authors:
 [1] ;  [2] ;  [3] ;  [4] ;  [3]
  1. Univ. of Massachusetts, Lowell, MA (United States)
  2. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  3. Northwestern Univ., Evanston, IL (United States)
  4. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Report Number(s):
SAND2017-3907J
Journal ID: ISSN 0920-8542; PII: 1904
Grant/Contract Number:
AC04-94AL85000; FG02-08ER25848; SC0001283; SC0005309; DESC0005340; DESC0007456; AC02-05CH11231
Type:
Accepted Manuscript
Journal Name:
Journal of Supercomputing
Additional Journal Information:
Journal Volume: 73; Journal Issue: 5; Journal ID: ISSN 0920-8542
Publisher:
Springer
Research Org:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel I/O; I/O variability; subfile; PnetCDF
OSTI Identifier:
1356839