skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Compiler-Assisted Detection of Transient Memory Errors

Conference ·

The probability of bit flips in hardware memory systems is projected to increase significantly as memory systems continue to scale in size and complexity. Effective hardware-based error detection and correction requires that the complete data path, involving all parts of the memory system, be protected with sufficient redundancy. First, this may be costly to employ on commodity computing platforms and second, even on high-end systems, protection against multi-bit errors may be lacking. Therefore, augmenting hardware error detection schemes with software techniques is of consider- able interest. In this paper, we consider software-level mechanisms to comprehensively detect transient memory faults. We develop novel compile-time algorithms to instrument application programs with checksum computation codes so as to detect memory errors. Unlike prior approaches that employ checksums on computational and architectural state, our scheme verifies every data access and works by tracking variables as they are produced and consumed. Experimental evaluation demonstrates that the proposed comprehensive error detection solution is viable as a completely software-only scheme. We also demonstrate that with limited hardware support, overheads of error detection can be further reduced.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1178887
Report Number(s):
PNNL-SA-100876; KJ0402000
Resource Relation:
Conference: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'14), June 9-11, 2014, Edinburgh, UK, 204-215
Country of Publication:
United States
Language:
English