Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Risk Management Techniques and Practice Workshop Workshop Report

Technical Report ·
DOI:https://doi.org/10.2172/949820· OSTI ID:949820
At the request of the Department of Energy (DOE) Office of Science (SC), Lawrence Livermore National Laboratory (LLNL) hosted a two-day Risk Management Techniques and Practice (RMTAP) workshop held September 18-19 at the Hotel Nikko in San Francisco. The purpose of the workshop, which was sponsored by the SC/Advanced Scientific Computing Research (ASCR) program and the National Nuclear Security Administration (NNSA)/Advanced Simulation and Computing (ASC) program, was to assess current and emerging techniques, practices, and lessons learned for effectively identifying, understanding, managing, and mitigating the risks associated with acquiring leading-edge computing systems at high-performance computing centers (HPCCs). Representatives from fifteen high-performance computing (HPC) organizations, four HPC vendor partners, and three government agencies attended the workshop. The overall workshop findings were: (1) Standard risk management techniques and tools are in the aggregate applicable to projects at HPCCs and are commonly employed by the HPC community; (2) HPC projects have characteristics that necessitate a tailoring of the standard risk management practices; (3) All HPCC acquisition projects can benefit by employing risk management, but the specific choice of risk management processes and tools is less important to the success of the project; (4) The special relationship between the HPCCs and HPC vendors must be reflected in the risk management strategy; (5) Best practices findings include developing a prioritized risk register with special attention to the top risks, establishing a practice of regular meetings and status updates with the platform partner, supporting regular and open reviews that engage the interests and expertise of a wide range of staff and stakeholders, and documenting and sharing the acquisition/build/deployment experience; and (6) Top risk categories include system scaling issues, request for proposal/contract and acceptance testing, and vendor technical or business problems. HPC, by its very nature, is an exercise in multi-level risk management. Every aspect of stewarding HPCCs into the petascale era, from identification of the program drivers to the details of procurement actions and simulation environment component deployments, represents unprecedented challenges and requires effective risk management. The fundamental purpose of this workshop was to go beyond risk management processes as such and learn how to weave effective risk management practices, techniques, and methods into all aspects of migrating HPCCs into the next generation of leadership computing systems. This workshop was a follow-on to the Petascale System Integration Workshop hosted by Lawrence Berkeley National Laboratory (LBNL)/NERSC last year. It was intended to leverage and extend the risk management experience of the participants by looking for common best practices and unique processes that have been especially successful. This workshop assessed the effectiveness of tools and techniques that are or could be helpful in HPCC risk management, with a special emphasis on how practice meets process. As the saying goes: 'In theory, there is no difference between theory and practice. In practice there is'. Finally, the workshop brought together a network of experts who shared information as technology moves into the petascale era and beyond.
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
949820
Report Number(s):
LLNL-TR-409240
Country of Publication:
United States
Language:
English

Similar Records

HPC Center of the Future: R&D Acquisition Intent
Technical Report · Mon Nov 18 23:00:00 EST 2024 · OSTI ID:2477904

Workshop on Software Development Tools for Petascale Computing
Conference · Wed Aug 01 00:00:00 EDT 2007 · OSTI ID:1367254

ECP libraries and tools: An overview
Journal Article · Thu Sep 12 20:00:00 EDT 2024 · International Journal of High Performance Computing Applications · OSTI ID:2530262