On Undecidability Aspects of Resilient Computations and Implications to Exascale

Rao, Nageswara S

Title: On Undecidability Aspects of Resilient Computations and Implications to Exascale

Conference · Wed Jan 01 00:00:00 EST 2014

OSTI ID:1163594

Rao, Nageswara S ^[1]

ORNL

Future Exascale computing systems with a large number of processors, memory elements and interconnection links, are expected to experience multiple, complex faults, which affect both applications and operating-runtime systems. A variety of algorithms, frameworks and tools are being proposed to realize and/or verify the resilience properties of computations that guarantee correct results on failure-prone computing systems. We analytically show that certain resilient computation problems in presence of general classes of faults are undecidable, that is, no algorithms exist for solving them. We first show that the membership verification in a generic set of resilient computations is undecidable. We describe classes of faults that can create infinite loops or non-halting computations, whose detection in general is undecidable. We then show certain resilient computation problems to be undecidable by using reductions from the loop detection and halting problems under two formulations, namely, an abstract programming language and Turing machines, respectively. These two reductions highlight different failure effects: the former represents program and data corruption, and the latter illustrates incorrect program execution. These results call for broad-based, well-characterized resilience approaches that complement purely computational solutions using methods such as hardware monitors, co-designs, and system- and application-specific diagnosis codes.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: DE-AC05-00OR22725

OSTI ID:: 1163594

Resource Relation:: Conference: Euro-Par 2014: Parallel Processing Workshops: Resilience 2014, Porto, Portugal, 20140824, 20140828

Country of Publication:: United States

Language:: English

Similar Records

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1163594

Shen, Xipeng

Holistic Measurement Driven Resilience: Combining Operational Fault and Failure Measurements and Fault Injection for Quantifying Fault Detection, Propagation and Impact. Final report

Technical Report · Thu Apr 16 00:00:00 EDT 2020 · OSTI ID:1163594

Kramer, William; Jha, Saurabh; Brandt, James; +1 more

Resiliency in numerical algorithm design for extreme scale simulations

Journal Article · Fri Dec 10 00:00:00 EST 2021 · International Journal of High Performance Computing Applications · OSTI ID:1163594

Agullo, Emmanuel; Altenbernd, Mirco; Anzt, Hartwig; +33 more

Related Subjects

Exascale systems
resilient computations
undecidability
uncomputability

Title: On Undecidability Aspects of Resilient Computations and Implications to Exascale

Citation Formats

Similar Records

Related Subjects