Addressing failures in exascale computing
We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held in Park City, Utah, August 4–11, 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system; discuss existing knowledge on resilience across the various hardware and software layers of an exascale system; and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia; and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1176844
- Report Number(s):
- PNNL-SA-101991; KJ0402000
- Journal Information:
- International Journal of High Performance Computing Applications, 28(2):129-173, Journal Name: International Journal of High Performance Computing Applications, 28(2):129-173
- Country of Publication:
- United States
- Language:
- English
Similar Records
2020 Exascale Computing Project Annual Meeting (Executive Summary Report)
Support for the Core Research Activities and Studies of the Computer Science and Telecommunications Board (CSTB)