skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characterization of the Impact of Soft Errors on Iterative Methods

Abstract

Soft errors caused by transient bit flips have the potential to significantly impact an application’s behavior. This has motivated the design of an array of techniques to detect, isolate, and correct soft errors using microarchitectural, architectural, compilation-based, or application-level techniques to minimize their impact on the executing application. The first step towards the design of good error detection/correction techniques involves an understanding of an application’s vulnerability to soft errors. In this paper, we present the first comprehensive characterization of the impact of soft errors on the convergence characteristics of six iterative methods using application-level fault injection. In particular, we consider the use of iterative methods to incrementally solve a linear systems of equations, which constitutes the core kernel in many scientific applications. We analyze the impact of soft errors in terms of the type of error (single- vs multi-bit), the distribution and location of bits affected, the data structure and the statement impacted, and variation with time. In addition to understanding the vulnerability of iterative solvers to soft errors, this characterization can aid the design of fault injection campaigns that ensure systematic coverage.

Authors:
ORCiD logo [1];  [2]; ORCiD logo [1];  [3]; ORCiD logo [1];  [1]
  1. BATTELLE (PACIFIC NW LAB)
  2. Oak Ridge National Laboratory
  3. Barcelona Supercomputing Center
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1512780
Report Number(s):
PNNL-SA-138072
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: 25TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HiPC 2018), December 17-20, 2018, Bengaluru, Inda
Country of Publication:
United States
Language:
English
Subject:
fault injection, iterative methods, Silent data corruption, Soft errors

Citation Formats

Mutlu, Burcu, Kestor, Gokcen G., Manzano Franco, Joseph B., Unsal, Osman, Chatterjee, Samrat, and Krishnamoorthy, Sriram. Characterization of the Impact of Soft Errors on Iterative Methods. United States: N. p., 2018. Web.
Mutlu, Burcu, Kestor, Gokcen G., Manzano Franco, Joseph B., Unsal, Osman, Chatterjee, Samrat, & Krishnamoorthy, Sriram. Characterization of the Impact of Soft Errors on Iterative Methods. United States.
Mutlu, Burcu, Kestor, Gokcen G., Manzano Franco, Joseph B., Unsal, Osman, Chatterjee, Samrat, and Krishnamoorthy, Sriram. Mon . "Characterization of the Impact of Soft Errors on Iterative Methods". United States.
@article{osti_1512780,
title = {Characterization of the Impact of Soft Errors on Iterative Methods},
author = {Mutlu, Burcu and Kestor, Gokcen G. and Manzano Franco, Joseph B. and Unsal, Osman and Chatterjee, Samrat and Krishnamoorthy, Sriram},
abstractNote = {Soft errors caused by transient bit flips have the potential to significantly impact an application’s behavior. This has motivated the design of an array of techniques to detect, isolate, and correct soft errors using microarchitectural, architectural, compilation-based, or application-level techniques to minimize their impact on the executing application. The first step towards the design of good error detection/correction techniques involves an understanding of an application’s vulnerability to soft errors. In this paper, we present the first comprehensive characterization of the impact of soft errors on the convergence characteristics of six iterative methods using application-level fault injection. In particular, we consider the use of iterative methods to incrementally solve a linear systems of equations, which constitutes the core kernel in many scientific applications. We analyze the impact of soft errors in terms of the type of error (single- vs multi-bit), the distribution and location of bits affected, the data structure and the statement impacted, and variation with time. In addition to understanding the vulnerability of iterative solvers to soft errors, this characterization can aid the design of fault injection campaigns that ensure systematic coverage.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {12}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: