skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Enabling power measurement and control on Astra: The first petascale Arm supercomputer

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.7303· OSTI ID:1887388

Summary Astra, deployed in 2018, was the first petascale supercomputer to utilize processors based on the ARM instruction set. The system was also the first under Sandia's Vanguard program which seeks to provide an evaluation vehicle for novel technologies that with refinement could be utilized in demanding, large‐scale HPC environments. In addition to ARM, several other important first‐of‐a‐kind developments were used in the machine, including new approaches to cooling the datacenter and machine. This article documents our experiences building a power measurement and control infrastructure for Astra. While this is often beyond the control of users today, the accurate measurement, cataloging, and evaluation of power, as our experiences show, is critical to the successful deployment of a large‐scale platform. While such systems exist in part for other architectures, Astra required new development to support the novel Marvell ThunderX2 processor used in compute nodes. In addition to documenting the measurement of power during system bring up and for subsequent on‐going routine use, we present results associated with controlling the power usage of the processor, an area which is becoming of progressively greater interest as data centers and supercomputing sites look to improve compute/energy efficiency and find additional sources for full system optimization.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
NA0003525
OSTI ID:
1887388
Alternate ID(s):
OSTI ID: 1886072
Report Number(s):
SAND2022-11546J; 709168
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 35, Issue 15; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (19)

Understanding power variation and its implications on performance optimization on the Cori supercomputer
  • Bhalachandra, Sridutt; Austin, Brian; Wright, Nicholas J.
  • 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS54543.2021.00011
conference November 2021
RAPL: memory power estimation and capping
  • David, Howard; Gorbatov, Eugene; Hanebutte, Ulf R.
  • Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10 https://doi.org/10.1145/1840845.1840883
conference January 2010
Standardizing Power Monitoring and Control at Exascale journal October 2016
The survey on ARM processors for HPC journal June 2019
Evaluating the Arm Ecosystem for High Performance Computing
  • Jackson, Adrian; Turner, Andrew; Weiland, Michèle
  • PASC '19: Platform for Advanced Scientific Computing Conference, Proceedings of the Platform for Advanced Scientific Computing Conference https://doi.org/10.1145/3324989.3325722
conference June 2019
Towards Performance Portability in a Compressible CFD Code conference June 2017
Ground Test Studies of the HIFiRE-1 Transition Experiment Part 2: Computational Analysis journal November 2008
Co-Design for A64FX Manycore Processor and ”Fugaku” conference November 2020
Chronicles of Astra: Challenges and Lessons from the First Petascale Arm Supercomputer conference November 2020
Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads conference July 2019
Entering the petaflop era: The architecture and performance of Roadrunner conference November 2008
Quantifying Energy Use in Dense Shared Memory HPC Node conference November 2016
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems journal August 2015
ARM HPC Ecosystem and the Reemergence of Vectors conference May 2017
ThunderX2 Performance and Energy-Efficiency for HPC Workloads journal March 2020
A Novel Approach for Job Scheduling Optimizations Under Power Cap for ARM and Intel HPC Systems conference December 2017
On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms journal February 2015
A performance analysis of the first generation of HPC‐optimized Arm processors journal February 2019
Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer
  • Shin, Woong; Oles, Vladyslav; Karimi, Ahmad Maroof
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476188
conference November 2021