skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing

Abstract

Energy efficiency and resilience are two crucial challenges for HPC systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1213014
Report Number(s):
PNNL-SA-109093
KJ0402000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29, 2015, Hyderabad, India, 786-796
Country of Publication:
United States
Language:
English
Subject:
resilience; failures; undervolting; HPC

Citation Formats

Tan, Li, Song, Shuaiwen, Wu, Panruo, Chen, Zizhong, Ge, Rong, and Kerbyson, Darren J. Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing. United States: N. p., 2015. Web. doi:10.1109/IPDPS.2015.108.
Tan, Li, Song, Shuaiwen, Wu, Panruo, Chen, Zizhong, Ge, Rong, & Kerbyson, Darren J. Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing. United States. doi:10.1109/IPDPS.2015.108.
Tan, Li, Song, Shuaiwen, Wu, Panruo, Chen, Zizhong, Ge, Rong, and Kerbyson, Darren J. Fri . "Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing". United States. doi:10.1109/IPDPS.2015.108.
@article{osti_1213014,
title = {Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing},
author = {Tan, Li and Song, Shuaiwen and Wu, Panruo and Chen, Zizhong and Ge, Rong and Kerbyson, Darren J.},
abstractNote = {Energy efficiency and resilience are two crucial challenges for HPC systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting.},
doi = {10.1109/IPDPS.2015.108},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: