skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Covering Resilience: A Recent Development for Binomial Checkpointing

Conference ·
OSTI ID:1366299

In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, required, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algorithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massive parallel simulations and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We describe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding implementation and discuss first numerical results.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1366299
Resource Relation:
Conference: 7th International Conference on Algorithmic Differentiation, 09/12/16 - 09/15/16, Oxford, GB
Country of Publication:
United States
Language:
English

Similar Records

Extending the Binomial Checkpointing Technique for Resilience
Conference · Mon Oct 10 00:00:00 EDT 2016 · OSTI ID:1366299

Resiliency in numerical algorithm design for extreme scale simulations
Journal Article · Fri Dec 10 00:00:00 EST 2021 · International Journal of High Performance Computing Applications · OSTI ID:1366299

Node failure resiliency for Uintah without checkpointing
Journal Article · Sun Jun 02 00:00:00 EDT 2019 · Concurrency and Computation. Practice and Experience · OSTI ID:1366299

Related Subjects