skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable I/O Systems via Node-Local Storage: Approaching 1 TB/sec File I/O

Technical Report ·
DOI:https://doi.org/10.2172/964079· OSTI ID:964079

In the race to PetaFLOP-speed supercomputing systems, the increase in computational capability has been accompanied by corresponding increases in CPU count, total RAM, and storage capacity. However, a proportional increase in storage bandwidth has lagged behind. In order to improve system reliability and to reduce maintenance effort for modern large-scale systems, system designers have opted to remove node-local storage from the compute nodes. Today's multi-TeraFLOP supercomputers are typically attached to parallel file systems that provide only tens of GBs/s of I/O bandwidth. As a result, such machines have access to much less than 1GB/s of I/O bandwidth per TeraFLOP of compute power, which is below the generally accepted limit required for a well-balanced system. In a many ways, the current I/O bottleneck limits the capabilities of modern supercomputers, specifically in terms of limiting their working sets and restricting fault tolerance techniques, which become critical on systems consisting of tens of thousands of components. This paper resolves the dilemma between high performance and high reliability by presenting an alternative system design which makes use of node-local storage to improve aggregate system I/O bandwidth. In this work, we focus on the checkpointing use-case and present an experimental evaluation of the Scalable Checkpoint/Restart (SCR) library, a new adaptive checkpointing library that uses node-local storage to significantly improve the checkpointing performance of large-scale supercomputers. Experiments show that SCR achieves unprecedented write speeds, reaching a measured 700GB/s of aggregate bandwidth on 8,752 processors and an estimated 1TB/s for a similarly structured machine of 12,500 processors. This corresponds to a speedup of over 70x compared to the bandwidth provided by the 10GB/s parallel file system the cluster uses. Further, SCR can adapt to an environment in which there is wide variation in performance or capacity among the individual node-local storage elements.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
964079
Report Number(s):
LLNL-TR-415791; TRN: US200919%%180
Country of Publication:
United States
Language:
English