skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Detecting Soft Errors in Stencil based Computations

Abstract

Given the growing emphasis on system resilience, it is important to develop software-level error detectors that help trap hardware-level faults with reasonable accuracy while minimizing false alarms as well as the performance overhead introduced. We present a technique that approaches this idea by taking stencil computations as our target, and synthesizing detectors based on machine learning. In particular, we employ linear regression to generate computationally inexpensive models which form the basis for error detection. Our technique has been incorporated into a new open-source library called SORREL. In addition to reporting encouraging experimental results, we demonstrate techniques that help reduce the size of training data. We also discuss the efficacy of various detectors synthesized, as well as our future plans.

Authors:
 [1];  [1];  [2]
  1. Univ. of Utah, Salt Lake City, UT (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1184174
Report Number(s):
LLNL-TR-670435
DOE Contract Number:
DE-AC52-07NA27344
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE

Citation Formats

Sharma, V., Gopalkrishnan, G., and Bronevetsky, G. Detecting Soft Errors in Stencil based Computations. United States: N. p., 2015. Web. doi:10.2172/1184174.
Sharma, V., Gopalkrishnan, G., & Bronevetsky, G. Detecting Soft Errors in Stencil based Computations. United States. doi:10.2172/1184174.
Sharma, V., Gopalkrishnan, G., and Bronevetsky, G. Wed . "Detecting Soft Errors in Stencil based Computations". United States. doi:10.2172/1184174. https://www.osti.gov/servlets/purl/1184174.
@article{osti_1184174,
title = {Detecting Soft Errors in Stencil based Computations},
author = {Sharma, V. and Gopalkrishnan, G. and Bronevetsky, G.},
abstractNote = {Given the growing emphasis on system resilience, it is important to develop software-level error detectors that help trap hardware-level faults with reasonable accuracy while minimizing false alarms as well as the performance overhead introduced. We present a technique that approaches this idea by taking stencil computations as our target, and synthesizing detectors based on machine learning. In particular, we employ linear regression to generate computationally inexpensive models which form the basis for error detection. Our technique has been incorporated into a new open-source library called SORREL. In addition to reporting encouraging experimental results, we demonstrate techniques that help reduce the size of training data. We also discuss the efficacy of various detectors synthesized, as well as our future plans.},
doi = {10.2172/1184174},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed May 06 00:00:00 EDT 2015},
month = {Wed May 06 00:00:00 EDT 2015}
}

Technical Report:

Save / Share:
  • A broad range of scientific computation involves the use of difference stencils. In a parallel computing environment, this computation is typically implemented by decomposing the spacial domain, inducing a 'halo exchange' of process-owned boundary data. This approach adheres to the Bulk Synchronous Parallel (BSP) model. Because commonly available architectures provide strong inter-node bandwidth relative to latency costs, many codes 'bulk up' these messages by aggregating data into a message as a means of reducing the number of messages. A renewed focus on non-traditional architectures and architecture features provides new opportunities for exploring alternatives to this programming approach. In this reportmore » we describe miniGhost, a 'miniapp' designed for exploration of the capabilities of current as well as emerging and future architectures within the context of these sorts of applications. MiniGhost joins the suite of miniapps developed as part of the Mantevo project.« less
  • Here, stencils are commonly used to implement efficient on–the–fly computations of linear operators arising from partial differential equations. At the same time the term “stencil” is not fully defined and can be interpreted differently depending on the application domain and the background of the software developers. Common features in stencil codes are the preservation of the structure given by the discretization of the partial differential equation and the benefit of minimal data storage. We discuss stencil concepts of different complexity, show how they are used in modern software packages like hypre and DUNE, and discuss recent efforts to extend themore » software to enable stencil computations of more complex problems and methods such as inf–sup–stable Stokes discretizations and mixed finite element discretizations.« less
  • Calculations of the washout and surface deposition of effluent, based on dispersion parameters in the literature, are presented. Maximum rates of deposition and washout for any given release are established. Washout appears to be the principal factor in maximizing surface contamination. Dry deposition is more likely to occur than washout. (auth)
  • A microprocessor-based, stored-program controller which incorporates a floating-point arithmetic unit to perform complex mathematical computations was developed to determine the thickness of conductors on printed wiring boards. Conductor thickness is calculated from measured resistance by means of curve-fitting equations in the stored program. Called a film thickness calculator, the instrument demonstrates a method which may serve as a basis for other designs involving microprocessor-based data acquisition systems requiring low-speed calculations. 19 figures.