Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Characterizing output bottlenecks in a supercomputer

Conference ·
OSTI ID:1096349

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.

Research Organization:
Oak Ridge National Laboratory (ORNL); Center for Computational Sciences
Sponsoring Organization:
SC USDOE - Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1096349
Country of Publication:
United States
Language:
English

Similar Records

Characterizing output bottlenecks in a supercomputer
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1063838

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications
Journal Article · Tue Feb 04 23:00:00 EST 2020 · ACM Transactions on Storage · OSTI ID:1607202

Related Subjects