Enabling power measurement and control on Astra: The first petascale Arm supercomputer
Journal Article
·
· Concurrency and Computation. Practice and Experience
- Queen's University, Kingston, ON (Canada)
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States). Center for Computing Research
Astra, deployed in 2018, was the first petascale supercomputer to utilize processors based on the ARM instruction set. The system was also the first under Sandia's Vanguard program which seeks to provide an evaluation vehicle for novel technologies that with refinement could be utilized in demanding, large-scale HPC environments. In addition to ARM, several other important first-of-a-kind developments were used in the machine, including new approaches to cooling the datacenter and machine. Here we document our experiences building a power measurement and control infrastructure for Astra. While this is often beyond the control of users today, the accurate measurement, cataloging, and evaluation of power, as our experiences show, is critical to the successful deployment of a large-scale platform. While such systems exist in part for other architectures, Astra required new development to support the novel Marvell ThunderX2 processor used in compute nodes. In addition to documenting the measurement of power during system bring up and for subsequent on-going routine use, we present results associated with controlling the power usage of the processor, an area which is becoming of progressively greater interest as data centers and supercomputing sites look to improve compute/energy efficiency and find additional sources for full system optimization.
- Research Organization:
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE; USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- NA0003525
- OSTI ID:
- 1887388
- Alternate ID(s):
- OSTI ID: 1886072
- Report Number(s):
- SAND2022-11546J; 709168
- Journal Information:
- Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 15 Vol. 35; ISSN 1532-0626
- Publisher:
- WileyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment
Vanguard Astra and ATSE – an ARM-based Advanced Architecture Prototype System and Software Environment (FY18 L2 Milestone #8759 Report)
Technical Report
·
Sat Sep 01 00:00:00 EDT 2018
·
OSTI ID:1493831
Vanguard Astra and ATSE – an ARM-based Advanced Architecture Prototype System and Software Environment (FY18 L2 Milestone #8759 Report)
Technical Report
·
Sat Sep 01 00:00:00 EDT 2018
·
OSTI ID:1470822