skip to main content

SciTech ConnectSciTech Connect

Title: Addressing failures in exascale computing

We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held in Park City, Utah, August 4–11, 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system; discuss existing knowledge on resilience across the various hardware and software layers of an exascale system; and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia; and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.
Authors:
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »; ; ; ; ; ; ; ; « less
Publication Date:
OSTI Identifier:
1176844
Report Number(s):
PNNL-SA-101991
KJ0402000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Journal Article
Resource Relation:
Journal Name: International Journal of High Performance Computing Applications, 28(2):129-173
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English