Early Experiences with Node-Level Power Capping on the Cray XC40 Platform
Power consumption of extreme-scale supercomputers has become a key performance bottleneck. Yet current practices do not leverage power management opportunities, instead running at ''maximum power''. This is not sustainable. Future systems will need to manage power as a critical resource, directing where it has greatest benefit. Power capping is one mechanism for managing power budgets, however its behavior is not well understood. This paper presents an empirical evaluation of several key HPC workloads running under a power cap on a Cray XC40 system, and provides a comparison of this technique with p-state control, demonstrating the performance differences of each. These results show: 1. Maximum performance requires ensuring the cap is not reached; 2. Performance slowdown under a cap can be attributed to cascading delays which result in unsynchronized performance variability across nodes; and, 3. Due to lag in reaction time, considerable time is spent operating above the set cap. This work provides a timely and much needed comparison of HPC application performance under a power cap and attempts to enable users and system administrators to understand how to best optimize application performance on power-constrained HPC systems.
- Research Organization:
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1338038
- Report Number(s):
- SAND2015-8821C; 614669
- Country of Publication:
- United States
- Language:
- English
Similar Records
Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores
Are we witnessing the spectre of an HPC meltdown?: Are We Witnessing the Spectre of an HPC Meltdown?
Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont
Journal Article
·
Thu Aug 24 20:00:00 EDT 2017
· Concurrency and Computation. Practice and Experience
·
OSTI ID:1459400
Are we witnessing the spectre of an HPC meltdown?: Are We Witnessing the Spectre of an HPC Meltdown?
Journal Article
·
Mon Oct 15 20:00:00 EDT 2018
· Concurrency and Computation. Practice and Experience
·
OSTI ID:1488719
Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont
Journal Article
·
Sat Aug 05 20:00:00 EDT 2023
· Journal of Parallel and Distributed Computing
·
OSTI ID:2439951