Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner

Technical Report ·
DOI:https://doi.org/10.2172/1561016· OSTI ID:1561016
 [1];  [1];  [1];  [2];  [1];  [3];  [2];  [1]
  1. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  2. Duke Univ., Durham, NC (United States)
  3. Centre National de la Recherche Scientifique (CNRS), Orsay (France). Laboratoire d'Informatique pour la Mécanique et les Sciences de l'ingénieur (LIMSI)
We discuss algorithm-based resilience to silent data corruption (SDC) in a task- based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDC. The implementation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to ~ 51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corruption of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamics voltage/frequency scaling, and its interplay with fault-rates, and application overhead.
Research Organization:
Sandia National Laboratories (SNL-CA), Livermore, CA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000; AC02-05CH11231
OSTI ID:
1561016
Report Number(s):
SAND--2016-2091; 619982
Country of Publication:
United States
Language:
English

Similar Records

Exploring the interplay of resilience and energy consumption for a task-based partial differential equations preconditioner
Journal Article · Wed May 24 20:00:00 EDT 2017 · Parallel Computing · OSTI ID:1478742

ULFM-MPI Implementation of a Resilient Task-Based Partial Differential Equations Preconditioner [Poster]
Technical Report · Sun May 01 00:00:00 EDT 2016 · OSTI ID:1561476

Partial differential equations preconditioner resilient to soft and hard faults
Journal Article · Sat Jan 28 19:00:00 EST 2017 · International Journal of High Performance Computing Applications · OSTI ID:1544016

Related Subjects