FPDetect: Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation
- Univ. of Utah, Salt Lake City, UT (United States)
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
We present FPDetect, a low-overhead approach for detecting logical errors and soft errors affecting stencil computations without generating false positives. We develop an offline analysis that tightly estimates the number of floating-point bits preserved across stencil applications. This estimate rigorously bounds the values expected in the data space of the computation. Violations of this bound can be attributed with certainty to errors. FPDetect helps synthesize error detectors customized for user-specified levels of accuracy and coverage. FPDetect also enables overhead reduction techniques based on deploying these detectors coarsely in space and time. Experimental evaluations demonstrate the practicality of our approach.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
- Grant/Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1673584
- Report Number(s):
- PNNL-SA--153397; {"","Journal ID: ISSN 1544-3566"}
- Journal Information:
- ACM Transactions on Architecture and Code Optimization, Journal Name: ACM Transactions on Architecture and Code Optimization Journal Issue: 3 Vol. 17; ISSN 1544-3566
- Publisher:
- Association for Computing MachineryCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Panda: A Compiler Framework for Concurrent CPU $+$ GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations
Related Subjects
fault tolerance
stencil programs
soft-error detection
floating-point analysis
error detection and error correction
computer systems organization
reliability
software and its engineering
software verification
mathematics of computing
numerical analysis
floating point round-off error
stencil computations
affine analysis
interval analysis
silent data corruption
software bug detection