Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology
Energy efficiency and resilience are two crucial challenges for HPC systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1253880
- Report Number(s):
- PNNL-SA-114444; KJ0402000
- Resource Relation:
- Conference: 11th International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC 2016), January 18-20, 2016, Prague, Czech Republic
- Country of Publication:
- United States
- Language:
- English
Similar Records
Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing
CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows