skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Combating the Reliability Challenge of GPU Register File at Low Supply Voltage

Abstract

Supply voltage reduction is an effective approach to significantly reduce GPU energy consumption. As the largest on-chip storage structure, the GPU register file becomes the reliability hotspot that prevents further supply voltage reduction below the safe limit (Vmin) due to process variation effects. This work addresses the reliability challenge of the GPU register file at low supply voltages, which is an essential first step for aggressive supply voltage reduction of the entire GPU chip. We propose GR-Guard, an architectural solution that leverages long register dead time to enable reliable operations from unreliable register file at low voltages.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1339050
Report Number(s):
PNNL-SA-119484
KJ0402000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 25th International Conference on Parallel Architectures and Compilation (PACT '16), September 11-15, 2016, Haifa, Israel, 3-15
Country of Publication:
United States
Language:
English
Subject:
Energy-efficient design; Architecture-Compiler co-design; low voltage GPU

Citation Formats

Tan, Jingweijia, Song, Shuaiwen, Yan, Kaige, Fu, Xin, Marquez, Andres, and Kerbyson, Darren J. Combating the Reliability Challenge of GPU Register File at Low Supply Voltage. United States: N. p., 2016. Web. doi:10.1145/2967938.2967951.
Tan, Jingweijia, Song, Shuaiwen, Yan, Kaige, Fu, Xin, Marquez, Andres, & Kerbyson, Darren J. Combating the Reliability Challenge of GPU Register File at Low Supply Voltage. United States. doi:10.1145/2967938.2967951.
Tan, Jingweijia, Song, Shuaiwen, Yan, Kaige, Fu, Xin, Marquez, Andres, and Kerbyson, Darren J. 2016. "Combating the Reliability Challenge of GPU Register File at Low Supply Voltage". United States. doi:10.1145/2967938.2967951.
@article{osti_1339050,
title = {Combating the Reliability Challenge of GPU Register File at Low Supply Voltage},
author = {Tan, Jingweijia and Song, Shuaiwen and Yan, Kaige and Fu, Xin and Marquez, Andres and Kerbyson, Darren J.},
abstractNote = {Supply voltage reduction is an effective approach to significantly reduce GPU energy consumption. As the largest on-chip storage structure, the GPU register file becomes the reliability hotspot that prevents further supply voltage reduction below the safe limit (Vmin) due to process variation effects. This work addresses the reliability challenge of the GPU register file at low supply voltages, which is an essential first step for aggressive supply voltage reduction of the entire GPU chip. We propose GR-Guard, an architectural solution that leverages long register dead time to enable reliable operations from unreliable register file at low voltages.},
doi = {10.1145/2967938.2967951},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Recent trends in microprocessor design heavily rely on large register files with large I/O bandwidths for sustaining performance; a possible solution to relieve this bottleneck is the adoption of multiple register files. In this paper we show how the problem of assigning variables to multiple register banks can be reduced to that of a hypergraph coloring and, also, propose a technique to perform this coloring; this technique is applied to the problem of variable partitioning for multiple-register-file VLIW architectures.
  • Abstract not provided.
  • MicroElectroMechanical Systems (MEMS) that think, sense, act and communicate will open up a broad new array of cost-effective solutions only if MEMS is demonstrated to be sufficiently reliable. This could prove to be a major challenge if it is not addressed concurrently with technology development. There are three requirements for a valid assessment of reliability: statistical significance, identification of fundamental failure mechanisms and development of techniques for accelerating them, and valid physical models to allow prediction of failures during actual use. While these already exist for the microelectronics portion of such integrated systems, the real challenge lies in the lessmore » well-understood micromachine portions and its synergistic effects with microelectronics. This requires the elicitation of a methodology focused on MEMS reliability, which the authors discuss. A new testing and analysis infrastructure must also be developed to meet the needs of this methodology. They describe their implementation of this infrastructure and its success in addressing the three requirements for a valid reliability assessment.« less
  • MicroElectroMechanical Systems (MEMS) that think, sense, act and communicate will open up a broad new array of cost effective solutions only if they prove to be sufficiently reliable. A valid reliability assessment of MEMS has three prerequisites: (1) statistical significance; (2) a technique for accelerating fundamental failure mechanisms, and (3) valid physical models to allow prediction of failures during actual use. These already exist for the microelectronics portion of such integrated systems. The challenge lies in the less well understood micromachine portions and its synergistic effects with microelectronics. This paper presents a methodology addressing these prerequisites and a description ofmore » the underlying physics of reliability for micromachines.« less