Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
- Authors:
-
- National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Laboratory Berkeley California
- Computational Research Division (CRD) Lawrence Berkeley National Laboratory Berkeley California
- Publication Date:
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1574050
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Publisher's Accepted Manuscript
- Journal Name:
- Concurrency and Computation. Practice and Experience
- Additional Journal Information:
- Journal Name: Concurrency and Computation. Practice and Experience Journal Volume: 32 Journal Issue: 20; Journal ID: ISSN 1532-0626
- Publisher:
- Wiley Blackwell (John Wiley & Sons)
- Country of Publication:
- United Kingdom
- Language:
- English
Citation Formats
Yang, Charlene, Kurth, Thorsten, and Williams, Samuel. Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system. United Kingdom: N. p., 2019.
Web. doi:10.1002/cpe.5547.
Yang, Charlene, Kurth, Thorsten, & Williams, Samuel. Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system. United Kingdom. doi:10.1002/cpe.5547.
Yang, Charlene, Kurth, Thorsten, and Williams, Samuel. Tue .
"Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system". United Kingdom. doi:10.1002/cpe.5547.
@article{osti_1574050,
title = {Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system},
author = {Yang, Charlene and Kurth, Thorsten and Williams, Samuel},
abstractNote = {},
doi = {10.1002/cpe.5547},
journal = {Concurrency and Computation. Practice and Experience},
number = 20,
volume = 32,
place = {United Kingdom},
year = {2019},
month = {11}
}
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1002/cpe.5547
DOI: 10.1002/cpe.5547
Other availability
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability
conference, November 2018
- Yang, Charlene; Gayatri, Rahulkumar; Kurth, Thorsten
- 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
Deep Residual Learning for Image Recognition
conference, June 2016
- He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Roofline: an insightful visual performance model for multicore architectures
journal, April 2009
- Williams, Samuel; Waterman, Andrew; Patterson, David
- Communications of the ACM, Vol. 52, Issue 4
Electron self-energy calculation using a general multi-pole approximation
journal, April 2003
- Soininen, J. A.; Rehr, J. J.; Shirley, Eric L.
- Journal of Physics: Condensed Matter, Vol. 15, Issue 17
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis
journal, August 2019
- Ben-Nun, Tal; Hoefler, Torsten
- ACM Computing Surveys, Vol. 52, Issue 4