skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Instruction Roofline: An insightful visual performance model for GPUs

Conference ·
DOI:https://doi.org/10.1002/cpe.6591· OSTI ID:1844927

The Roofline performance model provides an intuitive approach to identify performance bottlenecks and guide performance optimization. However, the classic FLOP-centric approach is inappropriate for the emerging applications that perform more integer operations than floating point operations. In this article, we reintroduce our Instruction Roofline Model on NVIDIA GPUs and expand our evaluation of it. The Instruction Roofline incorporates instructions and memory transactions across all memory hierarchies together, and provides more performance insights than the FLOP-oriented Roofline Model, that is, instruction throughput, stride memory access patterns, bank conflicts, and thread predication. We use our Instruction Roofline methodology to analyze eight proxy applications: HPGMG from AMReX, Matrix Transpose benchmarks, ADEPT from MetaHipMer's sequence alignment phase, EXTENSION from MetaHipMer's local assembly phase, CUSP, cuSPARSE, cudaTensorCoreGemm, and cuBLAS. We demonstrate the ability of our methodology to understand various aspects of performance and performance bottlenecks on NVIDIA GPUs and motivate code optimizations.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1844927
Resource Relation:
Conference: Concurrency and Computation: Practice and Experience
Country of Publication:
United States
Language:
English

References (6)

Roofline: an insightful visual performance model for multicore architectures journal April 2009
An Instruction Roofline Model for GPUs conference November 2019
Cache-aware Roofline model: Upgrading the loft journal January 2014
Terabase-scale metagenome coassembly with MetaHipMer journal July 2020
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications journal December 2013
merAligner: A Fully Parallel Sequence Aligner conference May 2015

Similar Records

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
Journal Article · Tue Nov 12 00:00:00 EST 2019 · Concurrency and Computation. Practice and Experience · OSTI ID:1844927

GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems
Journal Article · Thu Dec 31 00:00:00 EST 2020 · Computer Physics Communications · OSTI ID:1844927

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability
Conference · Thu Nov 01 00:00:00 EDT 2018 · PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018 · OSTI ID:1844927

Related Subjects