Achieving algorithmic resilience for temporal integration through spectral deferred corrections
Abstract
Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. Here, we demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.
- Authors:
-
- National Renewable Energy Lab. (NREL), Golden, CO (United States). Computational Science Center
- Sandia National Lab. (SNL-CA), Livermore, CA (United States). Scalable Modeling and Analysis Dept.
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1436145
- Grant/Contract Number:
- AC02-05CH11231; AC36-08GO28308
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Communications in Applied Mathematics and Computational Science
- Additional Journal Information:
- Journal Volume: 12; Journal Issue: 1; Journal ID: ISSN 1559-3940
- Publisher:
- Mathematical Sciences Publishers
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; SDC; resilience; time integration; deferred correction; exascale computing; combustion
Citation Formats
Grout, Ray, Kolla, Hemanth, Minion, Michael, and Bell, John. Achieving algorithmic resilience for temporal integration through spectral deferred corrections. United States: N. p., 2017.
Web. doi:10.2140/camcos.2017.12.25.
Grout, Ray, Kolla, Hemanth, Minion, Michael, & Bell, John. Achieving algorithmic resilience for temporal integration through spectral deferred corrections. United States. https://doi.org/10.2140/camcos.2017.12.25
Grout, Ray, Kolla, Hemanth, Minion, Michael, and Bell, John. Mon .
"Achieving algorithmic resilience for temporal integration through spectral deferred corrections". United States. https://doi.org/10.2140/camcos.2017.12.25. https://www.osti.gov/servlets/purl/1436145.
@article{osti_1436145,
title = {Achieving algorithmic resilience for temporal integration through spectral deferred corrections},
author = {Grout, Ray and Kolla, Hemanth and Minion, Michael and Bell, John},
abstractNote = {Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. Here, we demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.},
doi = {10.2140/camcos.2017.12.25},
journal = {Communications in Applied Mathematics and Computational Science},
number = 1,
volume = 12,
place = {United States},
year = {Mon May 08 00:00:00 EDT 2017},
month = {Mon May 08 00:00:00 EDT 2017}
}
Web of Science
Works referenced in this record:
Turbulent flame–wall interaction: a direct numerical simulation study
journal, August 2010
- Gruber, A.; Sankaran, R.; Hawkes, E. R.
- Journal of Fluid Mechanics, Vol. 658
Silent error detection in numerical time-stepping schemes
journal, April 2014
- Benson, Austin R.; Schmit, Sven; Schreiber, Robert
- The International Journal of High Performance Computing Applications, Vol. 29, Issue 4
Structure of a spatially developing turbulent lean methane–air Bunsen flame
journal, January 2007
- Sankaran, Ramanan; Hawkes, Evatt R.; Chen, Jacqueline H.
- Proceedings of the Combustion Institute, Vol. 31, Issue 1
Terascale direct numerical simulations of turbulent combustion using S3D
journal, January 2009
- Chen, J. H.; Choudhary, A.; de Supinski, B.
- Computational Science & Discovery, Vol. 2, Issue 1
Conservative multi-implicit spectral deferred correction methods for reacting gas dynamics
journal, March 2004
- Layton, Anita T.; Minion, Michael L.
- Journal of Computational Physics, Vol. 194, Issue 2
Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design
conference, January 2012
- Hwang, Andy A.; Stefanovici, Ioan A.; Schroeder, Bianca
- Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12
Comments on high-order integrators embedded within integral deferred correction methods
journal, January 2009
- Christlieb, Andrew; Ong, Benjamin; Qiu, Jing-Mei
- Communications in Applied Mathematics and Computational Science, Vol. 4, Issue 1
Semi-implicit spectral deferred correction methods for ordinary differential equations
journal, January 2003
- Minion, Michael L.
- Communications in Mathematical Sciences, Vol. 1, Issue 3
The effect of threshold voltages on the soft error rate [memory and logic circuits]
conference, January 2004
- Degalahal, V.; Ramanarayanan, R.; Vijaykrishnan, N.
- 5th International Symposium on Quality Electronic Design, SCS 2003. International Symposium on Signals, Circuits and Systems. Proceedings (Cat. No.03EX720)
A direct numerical simulation study of turbulence and flame structure in transverse jets analysed in jet-trajectory based coordinates
journal, July 2012
- Grout, R. W.; Gruber, A.; Kolla, H.
- Journal of Fluid Mechanics, Vol. 706
Design challenges of technology scaling
journal, January 1999
- Borkar, S.
- IEEE Micro, Vol. 19, Issue 4
Evaluation of models for flame stretch due to curvature in the thin reaction zones regime
journal, January 2005
- Hawkes, Evatt R.; Chen, Jacqueline H.
- Proceedings of the Combustion Institute, Vol. 30, Issue 1
A study of DRAM failures in the field
conference, November 2012
- Sridharan, Vilas; Liberty, Dean
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Numerical Analysis of Fixed Point Algorithms in the Presence of Hardware Faults
journal, January 2015
- Stoyanov, Miroslav; Webster, Clayton
- SIAM Journal on Scientific Computing, Vol. 37, Issue 5
Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults
conference, June 2014
- Wei, Jiesheng; Thomas, Anna; Li, Guanpeng
- 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Spectral Deferred Correction Methods for Ordinary Differential Equations
journal, June 2000
- Dutt, Alok; Greengard, Leslie; Rokhlin, Vladimir
- Bit Numerical Mathematics, Vol. 40, Issue 2, p. 241-266
Direct numerical simulation of flame stabilization downstream of a transverse fuel jet in cross-flow
journal, January 2011
- Grout, R. W.; Gruber, A.; Yoo, C. S.
- Proceedings of the Combustion Institute, Vol. 33, Issue 1
Implications of the Choice of Quadrature Nodes for Picard Integral Deferred Corrections Methods for Ordinary Differential Equations
journal, June 2005
- Layton, Anita T.; Minion, Michael L.
- BIT Numerical Mathematics, Vol. 45, Issue 2
Scalar mixing in direct numerical simulations of temporally evolving plane jet flames with skeletal CO/H2 kinetics
journal, January 2007
- Hawkes, Evatt R.; Sankaran, Ramanan; Sutherland, James C.
- Proceedings of the Combustion Institute, Vol. 31, Issue 1
A deferred correction coupling strategy for low Mach number flow with complex chemistry
journal, December 2012
- Nonaka, A.; Bell, J. B.; Day, M. S.
- Combustion Theory and Modelling, Vol. 16, Issue 6
Asynchronous finite-difference schemes for partial differential equations
journal, October 2014
- Donzis, Diego A.; Aditya, Konduri
- Journal of Computational Physics, Vol. 274
The effects of non-uniform temperature distribution on the ignition of a lean homogeneous hydrogen–air mixture
journal, January 2005
- Sankaran, Ramanan; Im, Hong G.; Hawkes, Evatt R.
- Proceedings of the Combustion Institute, Vol. 30, Issue 1
Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: flame stabilization and structure
journal, December 2009
- Yoo, C. S.; Sankaran, R.; Chen, J. H.
- Journal of Fluid Mechanics, Vol. 640
Accelerating S3D: A GPGPU Case Study
book, January 2010
- Spafford, Kyle; Meredith, Jeremy; Vetter, Jeffrey
- Euro-Par 2009 – Parallel Processing Workshops
High-order multi-implicit spectral deferred correction methods for problems of reactive flow
journal, August 2003
- Bourlioux, Anne; Layton, Anita T.; Minion, Michael L.
- Journal of Computational Physics, Vol. 189, Issue 2
Toward an efficient parallel in time method for partial differential equations
journal, January 2012
- Emmett, Matthew; Minion, Michael
- Communications in Applied Mathematics and Computational Science, Vol. 7, Issue 1
Solving Ordinary Differential Equations II
book, September 1996
- Hairer, Ernst; Wanner, Gerhard
- Springer Series in Computational Mathematics
An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance
conference, June 2013
- Sloan, Joseph; Kumar, Rakesh; Bronevetsky, Greg
- 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations
journal, November 2000
- Kennedy, Christopher A.; Carpenter, Mark H.; Lewis, R. Michael
- Applied Numerical Mathematics, Vol. 35, Issue 3
Cooperative Application/OS DRAM Fault Recovery
book, January 2012
- Bridges, Patrick G.; Hoemmen, Mark; Ferreira, Kurt B.
- Euro-Par 2011: Parallel Processing Workshops
An updated comprehensive kinetic model of hydrogen combustion
journal, January 2004
- Li, Juan; Zhao, Zhenwei; Kazakov, Andrei
- International Journal of Chemical Kinetics, Vol. 36, Issue 10
DRAM errors in the wild: a large-scale field study
journal, February 2011
- Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich
- Communications of the ACM, Vol. 54, Issue 2
Direct numerical simulation of autoignition in non-homogeneous hydrogen-air mixtures
journal, August 2003
- Echekki, Tarek; Chen, Jacqueline H.
- Combustion and Flame, Vol. 134, Issue 3
Evaluating the Error Resilience of Parallel Programs
conference, June 2014
- Fang, Bo; Pattabiraman, Karthik; Ripeanu, Matei
- 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Performance of Under-resolved Two-Dimensional Incompressible Flow Simulations
journal, November 1995
- Brown, David L.
- Journal of Computational Physics, Vol. 122, Issue 1
Impact of deep submicron technology on dependability of VLSI circuits
conference, January 2002
- Constantinescu, C.
- Proceedings International Conference on Dependable Systems and Networks
Design challenges of technology scaling
journal, January 1999
- Borkar, S.
- IEEE Micro, Vol. 19, Issue 4
Toward an efficient parallel in time method for partial differential equations
journal, January 2012
- Emmett, Matthew; Minion, Michael
- Communications in Applied Mathematics and Computational Science, Vol. 7, Issue 1
Works referencing / citing this record:
Data recovery in computational fluid dynamics through deep image priors
preprint, January 2019
- de Frahan, Marc T. Henry; Grout, Ray W.
- arXiv
A scalable weakly-synchronous algorithm for solving partial differential equations
preprint, January 2019
- Aditya, Konduri; Gysi, Tobias; Kwasniewski, Grzegorz
- arXiv