Detecting Soft Errors in Stencil based Computations
Abstract
Given the growing emphasis on system resilience, it is important to develop software-level error detectors that help trap hardware-level faults with reasonable accuracy while minimizing false alarms as well as the performance overhead introduced. We present a technique that approaches this idea by taking stencil computations as our target, and synthesizing detectors based on machine learning. In particular, we employ linear regression to generate computationally inexpensive models which form the basis for error detection. Our technique has been incorporated into a new open-source library called SORREL. In addition to reporting encouraging experimental results, we demonstrate techniques that help reduce the size of training data. We also discuss the efficacy of various detectors synthesized, as well as our future plans.
- Authors:
-
- Univ. of Utah, Salt Lake City, UT (United States)
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Publication Date:
- Research Org.:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1184174
- Report Number(s):
- LLNL-TR-670435
- DOE Contract Number:
- DE-AC52-07NA27344
- Resource Type:
- Technical Report
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
Citation Formats
Sharma, V., Gopalkrishnan, G., and Bronevetsky, G. Detecting Soft Errors in Stencil based Computations. United States: N. p., 2015.
Web. doi:10.2172/1184174.
Sharma, V., Gopalkrishnan, G., & Bronevetsky, G. Detecting Soft Errors in Stencil based Computations. United States. doi:10.2172/1184174.
Sharma, V., Gopalkrishnan, G., and Bronevetsky, G. Wed .
"Detecting Soft Errors in Stencil based Computations". United States.
doi:10.2172/1184174. https://www.osti.gov/servlets/purl/1184174.
@article{osti_1184174,
title = {Detecting Soft Errors in Stencil based Computations},
author = {Sharma, V. and Gopalkrishnan, G. and Bronevetsky, G.},
abstractNote = {Given the growing emphasis on system resilience, it is important to develop software-level error detectors that help trap hardware-level faults with reasonable accuracy while minimizing false alarms as well as the performance overhead introduced. We present a technique that approaches this idea by taking stencil computations as our target, and synthesizing detectors based on machine learning. In particular, we employ linear regression to generate computationally inexpensive models which form the basis for error detection. Our technique has been incorporated into a new open-source library called SORREL. In addition to reporting encouraging experimental results, we demonstrate techniques that help reduce the size of training data. We also discuss the efficacy of various detectors synthesized, as well as our future plans.},
doi = {10.2172/1184174},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed May 06 00:00:00 EDT 2015},
month = {Wed May 06 00:00:00 EDT 2015}
}
-
A broad range of scientific computation involves the use of difference stencils. In a parallel computing environment, this computation is typically implemented by decomposing the spacial domain, inducing a 'halo exchange' of process-owned boundary data. This approach adheres to the Bulk Synchronous Parallel (BSP) model. Because commonly available architectures provide strong inter-node bandwidth relative to latency costs, many codes 'bulk up' these messages by aggregating data into a message as a means of reducing the number of messages. A renewed focus on non-traditional architectures and architecture features provides new opportunities for exploring alternatives to this programming approach. In this reportmore »
-
Stencil computations for PDE-based applications with examples from DUNE and hypre
Here, stencils are commonly used to implement efficient on–the–fly computations of linear operators arising from partial differential equations. At the same time the term “stencil” is not fully defined and can be interpreted differently depending on the application domain and the background of the software developers. Common features in stencil codes are the preservation of the structure given by the discretization of the partial differential equation and the benefit of minimal data storage. We discuss stencil concepts of different complexity, show how they are used in modern software packages like hypre and DUNE, and discuss recent efforts to extend themore » -
DEPOSITION AND WASHOUT COMPUTATIONS BASED ON THE GENERALIZED GAUSSIAN PLUME MODEL
Calculations of the washout and surface deposition of effluent, based on dispersion parameters in the literature, are presented. Maximum rates of deposition and washout for any given release are established. Washout appears to be the principal factor in maximizing surface contamination. Dry deposition is more likely to occur than washout. (auth) -
Microprocessor-based data acquisition system incorporating a floating-point arithmetic unit for complex mathematical computations. [To determine thickness of conductors on printed wiring boards from measured resistance by use of curve-fitting equations]
A microprocessor-based, stored-program controller which incorporates a floating-point arithmetic unit to perform complex mathematical computations was developed to determine the thickness of conductors on printed wiring boards. Conductor thickness is calculated from measured resistance by means of curve-fitting equations in the stored program. Called a film thickness calculator, the instrument demonstrates a method which may serve as a basis for other designs involving microprocessor-based data acquisition systems requiring low-speed calculations. 19 figures.