DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Achieving algorithmic resilience for temporal integration through spectral deferred corrections

Abstract

Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. Here, we demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.

Authors:
 [1];  [2];  [3];  [3]
  1. National Renewable Energy Lab. (NREL), Golden, CO (United States). Computational Science Center
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States). Scalable Modeling and Analysis Dept.
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1436145
Grant/Contract Number:  
AC02-05CH11231; AC36-08GO28308
Resource Type:
Accepted Manuscript
Journal Name:
Communications in Applied Mathematics and Computational Science
Additional Journal Information:
Journal Volume: 12; Journal Issue: 1; Journal ID: ISSN 1559-3940
Publisher:
Mathematical Sciences Publishers
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; SDC; resilience; time integration; deferred correction; exascale computing; combustion

Citation Formats

Grout, Ray, Kolla, Hemanth, Minion, Michael, and Bell, John. Achieving algorithmic resilience for temporal integration through spectral deferred corrections. United States: N. p., 2017. Web. doi:10.2140/camcos.2017.12.25.
Grout, Ray, Kolla, Hemanth, Minion, Michael, & Bell, John. Achieving algorithmic resilience for temporal integration through spectral deferred corrections. United States. https://doi.org/10.2140/camcos.2017.12.25
Grout, Ray, Kolla, Hemanth, Minion, Michael, and Bell, John. Mon . "Achieving algorithmic resilience for temporal integration through spectral deferred corrections". United States. https://doi.org/10.2140/camcos.2017.12.25. https://www.osti.gov/servlets/purl/1436145.
@article{osti_1436145,
title = {Achieving algorithmic resilience for temporal integration through spectral deferred corrections},
author = {Grout, Ray and Kolla, Hemanth and Minion, Michael and Bell, John},
abstractNote = {Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. Here, we demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.},
doi = {10.2140/camcos.2017.12.25},
journal = {Communications in Applied Mathematics and Computational Science},
number = 1,
volume = 12,
place = {United States},
year = {Mon May 08 00:00:00 EDT 2017},
month = {Mon May 08 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Turbulent flame–wall interaction: a direct numerical simulation study
journal, August 2010


Silent error detection in numerical time-stepping schemes
journal, April 2014

  • Benson, Austin R.; Schmit, Sven; Schreiber, Robert
  • The International Journal of High Performance Computing Applications, Vol. 29, Issue 4
  • DOI: 10.1177/1094342014532297

Structure of a spatially developing turbulent lean methane–air Bunsen flame
journal, January 2007

  • Sankaran, Ramanan; Hawkes, Evatt R.; Chen, Jacqueline H.
  • Proceedings of the Combustion Institute, Vol. 31, Issue 1
  • DOI: 10.1016/j.proci.2006.08.025

Terascale direct numerical simulations of turbulent combustion using S3D
journal, January 2009


Conservative multi-implicit spectral deferred correction methods for reacting gas dynamics
journal, March 2004


Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design
conference, January 2012

  • Hwang, Andy A.; Stefanovici, Ioan A.; Schroeder, Bianca
  • Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12
  • DOI: 10.1145/2150976.2150989

Comments on high-order integrators embedded within integral deferred correction methods
journal, January 2009

  • Christlieb, Andrew; Ong, Benjamin; Qiu, Jing-Mei
  • Communications in Applied Mathematics and Computational Science, Vol. 4, Issue 1
  • DOI: 10.2140/camcos.2009.4.27

Semi-implicit spectral deferred correction methods for ordinary differential equations
journal, January 2003


The effect of threshold voltages on the soft error rate [memory and logic circuits]
conference, January 2004

  • Degalahal, V.; Ramanarayanan, R.; Vijaykrishnan, N.
  • 5th International Symposium on Quality Electronic Design, SCS 2003. International Symposium on Signals, Circuits and Systems. Proceedings (Cat. No.03EX720)
  • DOI: 10.1109/ISQED.2004.1283723

Design challenges of technology scaling
journal, January 1999


Evaluation of models for flame stretch due to curvature in the thin reaction zones regime
journal, January 2005


A study of DRAM failures in the field
conference, November 2012

  • Sridharan, Vilas; Liberty, Dean
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.13

Numerical Analysis of Fixed Point Algorithms in the Presence of Hardware Faults
journal, January 2015

  • Stoyanov, Miroslav; Webster, Clayton
  • SIAM Journal on Scientific Computing, Vol. 37, Issue 5
  • DOI: 10.1137/140991406

Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults
conference, June 2014

  • Wei, Jiesheng; Thomas, Anna; Li, Guanpeng
  • 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
  • DOI: 10.1109/DSN.2014.2

Spectral Deferred Correction Methods for Ordinary Differential Equations
journal, June 2000

  • Dutt, Alok; Greengard, Leslie; Rokhlin, Vladimir
  • Bit Numerical Mathematics, Vol. 40, Issue 2, p. 241-266
  • DOI: 10.1023/A:1022338906936

Direct numerical simulation of flame stabilization downstream of a transverse fuel jet in cross-flow
journal, January 2011


Implications of the Choice of Quadrature Nodes for Picard Integral Deferred Corrections Methods for Ordinary Differential Equations
journal, June 2005


Scalar mixing in direct numerical simulations of temporally evolving plane jet flames with skeletal CO/H2 kinetics
journal, January 2007

  • Hawkes, Evatt R.; Sankaran, Ramanan; Sutherland, James C.
  • Proceedings of the Combustion Institute, Vol. 31, Issue 1
  • DOI: 10.1016/j.proci.2006.08.079

A deferred correction coupling strategy for low Mach number flow with complex chemistry
journal, December 2012


Asynchronous finite-difference schemes for partial differential equations
journal, October 2014


The effects of non-uniform temperature distribution on the ignition of a lean homogeneous hydrogen–air mixture
journal, January 2005

  • Sankaran, Ramanan; Im, Hong G.; Hawkes, Evatt R.
  • Proceedings of the Combustion Institute, Vol. 30, Issue 1
  • DOI: 10.1016/j.proci.2004.08.176

Accelerating S3D: A GPGPU Case Study
book, January 2010


High-order multi-implicit spectral deferred correction methods for problems of reactive flow
journal, August 2003


Toward an efficient parallel in time method for partial differential equations
journal, January 2012

  • Emmett, Matthew; Minion, Michael
  • Communications in Applied Mathematics and Computational Science, Vol. 7, Issue 1
  • DOI: 10.2140/camcos.2012.7.105

Solving Ordinary Differential Equations II
book, September 1996


An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance
conference, June 2013

  • Sloan, Joseph; Kumar, Rakesh; Bronevetsky, Greg
  • 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
  • DOI: 10.1109/DSN.2013.6575309

Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations
journal, November 2000

  • Kennedy, Christopher A.; Carpenter, Mark H.; Lewis, R. Michael
  • Applied Numerical Mathematics, Vol. 35, Issue 3
  • DOI: 10.1016/S0168-9274(99)00141-5

Cooperative Application/OS DRAM Fault Recovery
book, January 2012


An updated comprehensive kinetic model of hydrogen combustion
journal, January 2004

  • Li, Juan; Zhao, Zhenwei; Kazakov, Andrei
  • International Journal of Chemical Kinetics, Vol. 36, Issue 10
  • DOI: 10.1002/kin.20026

DRAM errors in the wild: a large-scale field study
journal, February 2011

  • Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich
  • Communications of the ACM, Vol. 54, Issue 2
  • DOI: 10.1145/1897816.1897844

Direct numerical simulation of autoignition in non-homogeneous hydrogen-air mixtures
journal, August 2003


Evaluating the Error Resilience of Parallel Programs
conference, June 2014

  • Fang, Bo; Pattabiraman, Karthik; Ripeanu, Matei
  • 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
  • DOI: 10.1109/DSN.2014.73

Performance of Under-resolved Two-Dimensional Incompressible Flow Simulations
journal, November 1995


Impact of deep submicron technology on dependability of VLSI circuits
conference, January 2002

  • Constantinescu, C.
  • Proceedings International Conference on Dependable Systems and Networks
  • DOI: 10.1109/DSN.2002.1028901

Design challenges of technology scaling
journal, January 1999


Toward an efficient parallel in time method for partial differential equations
journal, January 2012

  • Emmett, Matthew; Minion, Michael
  • Communications in Applied Mathematics and Computational Science, Vol. 7, Issue 1
  • DOI: 10.2140/camcos.2012.7.105

Works referencing / citing this record:

Data recovery in computational fluid dynamics through deep image priors
preprint, January 2019


A scalable weakly-synchronous algorithm for solving partial differential equations
preprint, January 2019