Compiler-Assisted Detection of Transient Memory Errors
The probability of bit flips in hardware memory systems is projected to increase significantly as memory systems continue to scale in size and complexity. Effective hardware-based error detection and correction requires that the complete data path, involving all parts of the memory system, be protected with sufficient redundancy. First, this may be costly to employ on commodity computing platforms and second, even on high-end systems, protection against multi-bit errors may be lacking. Therefore, augmenting hardware error detection schemes with software techniques is of consider- able interest. In this paper, we consider software-level mechanisms to comprehensively detect transient memory faults. We develop novel compile-time algorithms to instrument application programs with checksum computation codes so as to detect memory errors. Unlike prior approaches that employ checksums on computational and architectural state, our scheme verifies every data access and works by tracking variables as they are produced and consumed. Experimental evaluation demonstrates that the proposed comprehensive error detection solution is viable as a completely software-only scheme. We also demonstrate that with limited hardware support, overheads of error detection can be further reduced.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Resource Relation:
- Conference: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'14), June 9-11, 2014, Edinburgh, UK, 204-215
- Association for Computing Machinery, New York, NY, United States(US).
- Research Org:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Org:
- Country of Publication:
- United States
- fault detection; def-use analysis