skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Sierra Center of Excellence: Lessons Learned

Abstract

The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. Over the last five years, the Sierra Center of Excellence (CoE) has brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. This article shares the process we applied for the CoE and documents lessons learned during the collaboration, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to work for the management of such a collaboration and best practices for algorithms and source code, system configuration and software stack, tools, and application performance.

Authors:
 [1];  [2];  [2];  [2];  [3];  [2];  [4]; ;  [2];  [2];  [2]
  1. Vulcan Inc., Seattle, WA (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. IBM Cambridge Research Center, Cambridge, MA (United States)
  4. IBM Research, Yorktown Heights, NY (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1631911
Report Number(s):
LLNL-JRNL-789080
Journal ID: ISSN 0018-8646; 987415
Grant/Contract Number:  
AC52-07NA27344
Resource Type:
Accepted Manuscript
Journal Name:
IBM Journal of Research and Development
Additional Journal Information:
Journal Volume: 64; Journal Issue: 3/4; Journal ID: ISSN 0018-8646
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Dahm, Johann P., Richards, David F., Black, Aaron, Bertsch, Adam D., Grinberg, Leopold, Karlin, Ian, Kokkioa-Schumacher, Sara, León, Edgar A., Neely, John R., Pankajakshan, Ramesh, and Pearce, Olga. Sierra Center of Excellence: Lessons Learned. United States: N. p., 2019. Web. https://doi.org/10.1147/JRD.2019.2961069.
Dahm, Johann P., Richards, David F., Black, Aaron, Bertsch, Adam D., Grinberg, Leopold, Karlin, Ian, Kokkioa-Schumacher, Sara, León, Edgar A., Neely, John R., Pankajakshan, Ramesh, & Pearce, Olga. Sierra Center of Excellence: Lessons Learned. United States. https://doi.org/10.1147/JRD.2019.2961069
Dahm, Johann P., Richards, David F., Black, Aaron, Bertsch, Adam D., Grinberg, Leopold, Karlin, Ian, Kokkioa-Schumacher, Sara, León, Edgar A., Neely, John R., Pankajakshan, Ramesh, and Pearce, Olga. Fri . "Sierra Center of Excellence: Lessons Learned". United States. https://doi.org/10.1147/JRD.2019.2961069. https://www.osti.gov/servlets/purl/1631911.
@article{osti_1631911,
title = {Sierra Center of Excellence: Lessons Learned},
author = {Dahm, Johann P. and Richards, David F. and Black, Aaron and Bertsch, Adam D. and Grinberg, Leopold and Karlin, Ian and Kokkioa-Schumacher, Sara and León, Edgar A. and Neely, John R. and Pankajakshan, Ramesh and Pearce, Olga},
abstractNote = {The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. Over the last five years, the Sierra Center of Excellence (CoE) has brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. This article shares the process we applied for the CoE and documents lessons learned during the collaboration, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to work for the management of such a collaboration and best practices for algorithms and source code, system configuration and software stack, tools, and application performance.},
doi = {10.1147/JRD.2019.2961069},
journal = {IBM Journal of Research and Development},
number = 3/4,
volume = 64,
place = {United States},
year = {2019},
month = {12}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: