skip to main content

SciTech ConnectSciTech Connect

Title: Compiler-Assisted Detection of Transient Memory Errors

The probability of bit flips in hardware memory systems is projected to increase significantly as memory systems continue to scale in size and complexity. Effective hardware-based error detection and correction requires that the complete data path, involving all parts of the memory system, be protected with sufficient redundancy. First, this may be costly to employ on commodity computing platforms and second, even on high-end systems, protection against multi-bit errors may be lacking. Therefore, augmenting hardware error detection schemes with software techniques is of consider- able interest. In this paper, we consider software-level mechanisms to comprehensively detect transient memory faults. We develop novel compile-time algorithms to instrument application programs with checksum computation codes so as to detect memory errors. Unlike prior approaches that employ checksums on computational and architectural state, our scheme verifies every data access and works by tracking variables as they are produced and consumed. Experimental evaluation demonstrates that the proposed comprehensive error detection solution is viable as a completely software-only scheme. We also demonstrate that with limited hardware support, overheads of error detection can be further reduced.
Authors:
; ;
Publication Date:
OSTI Identifier:
1178887
Report Number(s):
PNNL-SA-100876
KJ0402000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'14), June 9-11, 2014, Edinburgh, UK, 204-215
Publisher:
Association for Computing Machinery, New York, NY, United States(US).
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
fault detection; def-use analysis