Architecture and performance of Perlmutter's 35 PB ClusterStor E1000 all-flash file system
Journal Article
·
· Concurrency and Computation. Practice and Experience
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
NERSC's newest system, Perlmutter, features a 35 PB all-flash Lustre file system built on HPE Cray ClusterStor E1000. Here, we present its architecture, early performance figures, and performance considerations unique to this architecture. We demonstrate the performance of E1000 OSSes through low-level Lustre tests that achieve over 90% of the theoretical bandwidth of the SSDs at the OST and LNet levels. We also show end-to-end performance for both traditional dimensions of I/O performance (peak bulk-synchronous bandwidth) and nonoptimal workloads endemic to production computing (small, incoherent I/Os at random offsets) and compare them to NERSC's previous system, Cori, to illustrate that Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. Finally, we discuss performance considerations unique to all-flash Lustre and present ways in which users and HPC facilities can adjust their I/O patterns and operations to make optimal use of such architectures.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 2440410
- Alternate ID(s):
- OSTI ID: 2406270
- Journal Information:
- Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 23 Vol. 36; ISSN 1532-0626
- Publisher:
- WileyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
A Quantitative Approach to Architecting All-Flash Lustre File Systems
|
book | January 2019 |
An empirical study of I/O separation for burst buffers in HPC systems
|
journal | February 2021 |
Accelerating a Burst Buffer Via User-Level I/O Isolation
|
conference | September 2017 |
An In-Depth Analysis of the Slingshot Interconnect
|
conference | November 2020 |
GUIDE: a scalable information directory service to collect, federate, and analyze logs for operational insights into a leadership HPC facility
|
conference | November 2017 |
Diving into petascale production file systems through large scale profiling and analysis
|
conference | January 2017 |
Revisiting I/O behavior in large-scale storage systems
|
conference | November 2019 |
Similar Records
Architecture and Performance of Perlmutter’s 35 PB ClusterStor E1000 All-Flash File System
A Quantitative Approach to Architecting All-Flash Lustre File Systems
Conference
·
Thu Dec 31 23:00:00 EST 2020
·
OSTI ID:1798757
A Quantitative Approach to Architecting All-Flash Lustre File Systems
Conference
·
Mon Dec 31 23:00:00 EST 2018
·
OSTI ID:1827652