Instruction Roofline: An insightful visual performance model for GPUs

Ding, N; Awan, M; Williams, S

doi:10.1002/cpe.6591

Title: Instruction Roofline: An insightful visual performance model for GPUs

Conference · Fri Jan 01 00:00:00 EST 2021

DOI:https://doi.org/10.1002/cpe.6591· OSTI ID:1844927

Ding, N; Awan, M; Williams, S

The Roofline performance model provides an intuitive approach to identify performance bottlenecks and guide performance optimization. However, the classic FLOP-centric approach is inappropriate for the emerging applications that perform more integer operations than floating point operations. In this article, we reintroduce our Instruction Roofline Model on NVIDIA GPUs and expand our evaluation of it. The Instruction Roofline incorporates instructions and memory transactions across all memory hierarchies together, and provides more performance insights than the FLOP-oriented Roofline Model, that is, instruction throughput, stride memory access patterns, bank conflicts, and thread predication. We use our Instruction Roofline methodology to analyze eight proxy applications: HPGMG from AMReX, Matrix Transpose benchmarks, ADEPT from MetaHipMer's sequence alignment phase, EXTENSION from MetaHipMer's local assembly phase, CUSP, cuSPARSE, cudaTensorCoreGemm, and cuBLAS. We demonstrate the ability of our methodology to understand various aspects of performance and performance bottlenecks on NVIDIA GPUs and motivate code optimizations.

View Conference

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

DOE Contract Number:: AC02-05CH11231

OSTI ID:: 1844927

Resource Relation:: Conference: Concurrency and Computation: Practice and Experience

Country of Publication:: United States

Language:: English

References (6)

Roofline: an insightful visual performance model for multicore architectures Williams, Samuel; Waterman, Andrew; Patterson, David Communications of the ACM, Vol. 52, Issue 4 https://doi.org/10.1145/1498765.1498785	journal	April 2009
An Instruction Roofline Model for GPUs Ding, Nan; Williams, Samuel 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS49563.2019.00007	conference	November 2019
Cache-aware Roofline model: Upgrading the loft Ilic, Aleksandar; Pratas, Frederico; Sousa, Leonel IEEE Computer Architecture Letters, Vol. 13, Issue 1 https://doi.org/10.1109/L-CA.2013.6	journal	January 2014
Terabase-scale metagenome coassembly with MetaHipMer Hofmeyr, Steven; Egan, Rob; Georganas, Evangelos Scientific Reports, Vol. 10, Issue 1 https://doi.org/10.1038/s41598-020-67416-5	journal	July 2020
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P. PLoS ONE, Vol. 8, Issue 12 https://doi.org/10.1371/journal.pone.0082138	journal	December 2013
merAligner: A Fully Parallel Sequence Aligner Georganas, Evangelos; Buluc, Aydin; Chapman, Jarrod 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2015.96	conference	May 2015

Similar Records

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system

Journal Article · Tue Nov 12 00:00:00 EST 2019 · Concurrency and Computation. Practice and Experience · OSTI ID:1844927

Yang, Charlene; Kurth, Thorsten; Williams, Samuel

GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems

Journal Article · Thu Dec 31 00:00:00 EST 2020 · Computer Physics Communications · OSTI ID:1844927

Yu, Victor Wen-zhe; Moussa, Jonathan; Kůs, Pavel; +5 more

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

Conference · Thu Nov 01 00:00:00 EDT 2018 · PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018 · OSTI ID:1844927

Yang, Charlene; Gayatri, Rahulkumar; Kurth, Thorsten; +9 more

Title: Instruction Roofline: An insightful visual performance model for GPUs

Citation Formats

References (6)

Similar Records

Related Subjects