Fault Management Workshop Final Report, August 13, 2012
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Intel, Santa Clara, CA (United States)
- IBM, Armonk, NY (United States)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- USDOE Office of Science (SC), Washington, D.C. (United States)
A Department of Energy (DOE) Fault Management Workshop was held on June 6, 2012 at the BWI Airport Marriot hotel in Maryland. The goals of this workshop were to: 1) Describe the required HPC resilience for critical DOE mission needs; 2) Detail what HPC resilience research is already being done at the DOE national laboratories and is expected to be done by industry or other groups; 3) Determine what fault management research is a priority for DOE’s Office of Science and National Nuclear Security Administration (NNSA) over the next five years; and 4) Develop a roadmap for getting the necessary research accomplished in the timeframe when it will be needed by the large computing facilities across DOE.
- Research Organization:
- USDOE Office of Science (SC), Washington, D.C. (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- OSTI ID:
- 1471121
- Country of Publication:
- United States
- Language:
- English
Similar Records
Risk Management Techniques and Practice Workshop Workshop Report
Programming the next generation of supercomputers: proceedings for the Argonne workshop