Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Fault Management Workshop Final Report, August 13, 2012

Technical Report ·
DOI:https://doi.org/10.2172/1471121· OSTI ID:1471121
 [1];  [2];  [3];  [4];  [5];  [1];  [2];  [4];  [6];  [6];  [7];  [8];  [2];  [3];  [9];  [5];  [10]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  4. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  5. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  6. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  7. Intel, Santa Clara, CA (United States)
  8. IBM, Armonk, NY (United States)
  9. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  10. USDOE Office of Science (SC), Washington, D.C. (United States)

A Department of Energy (DOE) Fault Management Workshop was held on June 6, 2012 at the BWI Airport Marriot hotel in Maryland. The goals of this workshop were to: 1) Describe the required HPC resilience for critical DOE mission needs; 2) Detail what HPC resilience research is already being done at the DOE national laboratories and is expected to be done by industry or other groups; 3) Determine what fault management research is a priority for DOE’s Office of Science and National Nuclear Security Administration (NNSA) over the next five years; and 4) Develop a roadmap for getting the necessary research accomplished in the timeframe when it will be needed by the large computing facilities across DOE.

Research Organization:
USDOE Office of Science (SC), Washington, D.C. (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI ID:
1471121
Country of Publication:
United States
Language:
English

Similar Records

2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.
Technical Report · Sat Jan 31 23:00:00 EST 2009 · OSTI ID:971988

Risk Management Techniques and Practice Workshop Workshop Report
Technical Report · Mon Dec 01 23:00:00 EST 2008 · OSTI ID:949820

Programming the next generation of supercomputers: proceedings for the Argonne workshop
Technical Report · Mon Oct 01 00:00:00 EDT 1984 · OSTI ID:5857600

Related Subjects