skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system

Authors:
ORCiD logo [1]; ORCiD logo [1];  [2]
  1. National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Laboratory Berkeley California
  2. Computational Research Division (CRD) Lawrence Berkeley National Laboratory Berkeley California
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1574050
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Name: Concurrency and Computation. Practice and Experience Journal Volume: 32 Journal Issue: 20; Journal ID: ISSN 1532-0626
Publisher:
Wiley Blackwell (John Wiley & Sons)
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Yang, Charlene, Kurth, Thorsten, and Williams, Samuel. Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system. United Kingdom: N. p., 2019. Web. doi:10.1002/cpe.5547.
Yang, Charlene, Kurth, Thorsten, & Williams, Samuel. Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system. United Kingdom. doi:10.1002/cpe.5547.
Yang, Charlene, Kurth, Thorsten, and Williams, Samuel. Tue . "Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system". United Kingdom. doi:10.1002/cpe.5547.
@article{osti_1574050,
title = {Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system},
author = {Yang, Charlene and Kurth, Thorsten and Williams, Samuel},
abstractNote = {},
doi = {10.1002/cpe.5547},
journal = {Concurrency and Computation. Practice and Experience},
number = 20,
volume = 32,
place = {United Kingdom},
year = {2019},
month = {11}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1002/cpe.5547

Save / Share:

Works referenced in this record:

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability
conference, November 2018

  • Yang, Charlene; Gayatri, Rahulkumar; Kurth, Thorsten
  • 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
  • DOI: 10.1109/P3HPC.2018.00005

Deep Residual Learning for Image Recognition
conference, June 2016

  • He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2016.90

Roofline: an insightful visual performance model for multicore architectures
journal, April 2009

  • Williams, Samuel; Waterman, Andrew; Patterson, David
  • Communications of the ACM, Vol. 52, Issue 4
  • DOI: 10.1145/1498765.1498785

Electron self-energy calculation using a general multi-pole approximation
journal, April 2003

  • Soininen, J. A.; Rehr, J. J.; Shirley, Eric L.
  • Journal of Physics: Condensed Matter, Vol. 15, Issue 17
  • DOI: 10.1088/0953-8984/15/17/312

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis
journal, August 2019

  • Ben-Nun, Tal; Hoefler, Torsten
  • ACM Computing Surveys, Vol. 52, Issue 4
  • DOI: 10.1145/3320060