Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Architecture and performance of Perlmutter's 35 PB ClusterStor E1000 all-flash file system

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.8143· OSTI ID:2440410
NERSC's newest system, Perlmutter, features a 35 PB all-flash Lustre file system built on HPE Cray ClusterStor E1000. Here, we present its architecture, early performance figures, and performance considerations unique to this architecture. We demonstrate the performance of E1000 OSSes through low-level Lustre tests that achieve over 90% of the theoretical bandwidth of the SSDs at the OST and LNet levels. We also show end-to-end performance for both traditional dimensions of I/O performance (peak bulk-synchronous bandwidth) and nonoptimal workloads endemic to production computing (small, incoherent I/Os at random offsets) and compare them to NERSC's previous system, Cori, to illustrate that Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. Finally, we discuss performance considerations unique to all-flash Lustre and present ways in which users and HPC facilities can adjust their I/O patterns and operations to make optimal use of such architectures.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
2440410
Alternate ID(s):
OSTI ID: 2406270
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 23 Vol. 36; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (7)

A Quantitative Approach to Architecting All-Flash Lustre File Systems book January 2019
An empirical study of I/O separation for burst buffers in HPC systems journal February 2021
Accelerating a Burst Buffer Via User-Level I/O Isolation conference September 2017
An In-Depth Analysis of the Slingshot Interconnect conference November 2020
GUIDE: a scalable information directory service to collect, federate, and analyze logs for operational insights into a leadership HPC facility
  • Vazhkudai, Sudharshan S.; Miller, Ross; Tiwari, Devesh
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126946
conference November 2017
Diving into petascale production file systems through large scale profiling and analysis
  • Wang, Feiyi; Sim, Hyogi; Harr, Cameron
  • Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems - PDSW-DISCS '17 https://doi.org/10.1145/3149393.3149399
conference January 2017
Revisiting I/O behavior in large-scale storage systems
  • Patel, Tirthak; Byna, Suren; Lockwood, Glenn K.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356183
conference November 2019