Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer

Conference ·
As we approach the exascale computing era, the focused understanding of power consumption and its overall constraint on HPC architectures and applications are becoming increasingly paramount. Summit, located at the Oak Ridge Leadership Computing Facility (OLCF), is one of the fastest and largest pre-exascale platforms in operation today. This paper provides a first-order examination and analysis of power consumption at the component-level, node-level, and system-level, from all 4,626 Summit compute nodes, each with over 100 metrics at 1Hz frequency over the entire year of 2020. We also investigate the power characteristics and energy efficiency of over 840k Summit jobs and 250k GPU failure logs for further operational insights. To the best of our knowledge, this is the first systematic analysis of power data of HPC system at this scale.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1833956
Country of Publication:
United States
Language:
English

Similar Records

OLCF Summit Supercomputer GPU Snapshots During Double-Bit Errors and Normal Operations
Dataset · Thu Apr 20 00:00:00 EDT 2023 · OSTI ID:1970187

Long Term Per-Component Power and Thermal Measurements of the OLCF Summit System
Dataset · Mon Apr 11 00:00:00 EDT 2022 · OSTI ID:1861393

Pre-exascale accelerated application development: The ORNL Summit experience
Journal Article · Thu Apr 30 20:00:00 EDT 2020 · IBM Journal of Research and Development · OSTI ID:1649509

Related Subjects