Scalable I/O Systems via Node-Local Storage: Approaching 1 TB/sec File I/O
The growth in the computational capability of modern supercomputing systems has been accompanied by corresponding increases in CPU count, total RAM, and total storage capacity. Indeed, systems such as Blue-Gene/L [3], BlueGene/P, Ranger, and the Cray XT series have grown to more than 100k processors, with 100 TeraBytes of RAM and are attached to multi-PetaByte storage systems. However, as part of this design evolution, large supercomputers have lost node-local storage elements, such as disks. While this decision was motivated by important considerations like overall system reliability, it also resulted in these systems losing a key level in their memory hierarchy, with nothing to fill the gap between local RAM and the parallel file system. While today's large supercomputers are typically attached to fast parallel file systems, which provide tens of GBs/s of I/O bandwidth, the computational capacity has grown much faster than the storage bandwidth capacity. As such, these machines are now provided with much less than 1GB/s of I/O bandwidth per TeraFlop of compute power, which is below the generally accepted limit required for a well-balanced system [8] [16]. The result is that today's limited I/O bandwidth is choking the capabilities of modern supercomputers, specifically in terms of limiting their working sets and making fault tolerance techniques, such as checkpointing, prohibitively expensive. This paper presents an alternative system design oriented on using node-local storage to improve aggregate system I/O bandwidth. We focus on the checkpointing use-case and present an experimental evaluation of SCR, a new checkpointing library that makes use of node-local storage to significantly improve the performance of checkpointing on large-scale supercomputers. Experiments show that SCR achieves unprecedented write speeds, reaching 700GB/s on 8,752 processors. Our results scale such that we expect a similarly structured system consisting of 12,500 processors to achieve aggregate I/O bandwidth of 1 TB/s.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 945860
- Report Number(s):
- LLNL-CONF-404044; TRN: US200903%%840
- Resource Relation:
- Conference: Presented at: Supercomputing, Austin, TX, United States, Nov 15 - Nov 21, 2008
- Country of Publication:
- United States
- Language:
- English
Similar Records
SCR-Exa: Enhanced Scalable Checkpoint Restart (SCR) Library for Next Generation Exascale Computing
The Scalable Checkpoint/Restart Library