skip to main content

SciTech ConnectSciTech Connect

Title: Addressing failures in exascale computing

We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held in Park City, Utah, August 4–11, 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system; discuss existing knowledge on resilience across the various hardware and software layers of an exascale system; and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia; and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »; ; ; ; ; ; ; ; « less
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Journal Article
Resource Relation:
Journal Name: International Journal of High Performance Computing Applications, 28(2):129-173
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
Country of Publication:
United States