skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characterizing output bottlenecks in a supercomputer

Abstract

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.

Authors:
 [1];  [1];  [2];  [3];  [2];  [2];  [2]
  1. Duke University
  2. ORNL
  3. Intel Corporation
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). National Center for Computational Sciences (NCCS)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1063838
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE/ACM SC Conference on High Performance Computing Networking, Storage and Analysis, Salt Lake City, UT, USA, 20121111, 20121115
Country of Publication:
United States
Language:
English

Citation Formats

Xie, Bing, Chase, Jeffrey, Dillow, David A, Drokin, Oleg, Klasky, Scott A, Oral, H Sarp, and Podhorszki, Norbert. Characterizing output bottlenecks in a supercomputer. United States: N. p., 2012. Web.
Xie, Bing, Chase, Jeffrey, Dillow, David A, Drokin, Oleg, Klasky, Scott A, Oral, H Sarp, & Podhorszki, Norbert. Characterizing output bottlenecks in a supercomputer. United States.
Xie, Bing, Chase, Jeffrey, Dillow, David A, Drokin, Oleg, Klasky, Scott A, Oral, H Sarp, and Podhorszki, Norbert. 2012. "Characterizing output bottlenecks in a supercomputer". United States.
@article{osti_1063838,
title = {Characterizing output bottlenecks in a supercomputer},
author = {Xie, Bing and Chase, Jeffrey and Dillow, David A and Drokin, Oleg and Klasky, Scott A and Oral, H Sarp and Podhorszki, Norbert},
abstractNote = {Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.},
doi = {},
url = {https://www.osti.gov/biblio/1063838}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2012},
month = {Sun Jan 01 00:00:00 EST 2012}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: