Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Duke Univ., Durham, NC (United States)
- Centre National de la Recherche Scientifique (CNRS), Orsay (France). Laboratoire d'Informatique pour la Mécanique et les Sciences de l'ingénieur (LIMSI)
We discuss algorithm-based resilience to silent data corruption (SDC) in a task- based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDC. The implementation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to ~ 51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corruption of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamics voltage/frequency scaling, and its interplay with fault-rates, and application overhead.
- Research Organization:
- Sandia National Laboratories (SNL-CA), Livermore, CA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000; AC02-05CH11231
- OSTI ID:
- 1561016
- Report Number(s):
- SAND--2016-2091; 619982
- Country of Publication:
- United States
- Language:
- English
Similar Records
Exploring the interplay of resilience and energy consumption for a task-based partial differential equations preconditioner
ULFM-MPI Implementation of a Resilient Task-Based Partial Differential Equations Preconditioner [Poster]
Partial differential equations preconditioner resilient to soft and hard faults
Journal Article
·
Wed May 24 20:00:00 EDT 2017
· Parallel Computing
·
OSTI ID:1478742
ULFM-MPI Implementation of a Resilient Task-Based Partial Differential Equations Preconditioner [Poster]
Technical Report
·
Sun May 01 00:00:00 EDT 2016
·
OSTI ID:1561476
Partial differential equations preconditioner resilient to soft and hard faults
Journal Article
·
Sat Jan 28 19:00:00 EST 2017
· International Journal of High Performance Computing Applications
·
OSTI ID:1544016