skip to main content

Title: Detecting Soft Errors in Stencil based Computations

Given the growing emphasis on system resilience, it is important to develop software-level error detectors that help trap hardware-level faults with reasonable accuracy while minimizing false alarms as well as the performance overhead introduced. We present a technique that approaches this idea by taking stencil computations as our target, and synthesizing detectors based on machine learning. In particular, we employ linear regression to generate computationally inexpensive models which form the basis for error detection. Our technique has been incorporated into a new open-source library called SORREL. In addition to reporting encouraging experimental results, we demonstrate techniques that help reduce the size of training data. We also discuss the efficacy of various detectors synthesized, as well as our future plans.
Authors:
 [1] ;  [1] ;  [2]
  1. Univ. of Utah, Salt Lake City, UT (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
OSTI Identifier:
1184174
Report Number(s):
LLNL-TR-670435
DOE Contract Number:
DE-AC52-07NA27344
Resource Type:
Technical Report
Research Org:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE