skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Matrix Multiply Performance of GPUs on Exascale-class HPE/Cray Systems

Conference ·
OSTI ID:2224210

The computation of dense matrix-matrix products (GEMMs) is central to many modeling and simulation workloads as well as AI/ML deep learning campaigns. In fact, millions of dollars are spent annually on computing GEMMs, and large model training demands are increasing exponentially. Specialized processors such as GPUs are designed to perform well for these operations. However, the performance of GEMMs on GPUs can exhibit complex behaviors depending on many factors, making it challenging to optimize the performance of GEMMs on these processors. In this study we undertake an examination of GEMM performance on several leading GPU models taken from product lines of GPUs to be deployed in forthcoming exascale computing systems. We show results to illustrate the many factors that can affect performance of GEMMs on GPUs. We then present data collected from a large number of test runs for an example GEMM operation to show the dependence behaviors of GEMM rate on matrix dimensions. Finally, we show results from machine learning-based performance models using novel feature engineering methods to fit the measured performance, providing a potential basis for GEMM performance tuning and autotuning methods for GPUs. Recommendations are also given for how to achieve high GEMM performance on modern GPUs.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
2224210
Resource Relation:
Conference: Cray User Group CUG 2022 - Monterey, California, United States of America - 5/2/2022 4:00:00 AM-5/5/2022 4:00:00 AM
Country of Publication:
United States
Language:
English

Similar Records

Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
Journal Article · Thu Apr 11 00:00:00 EDT 2024 · Concurrency and Computation. Practice and Experience · OSTI ID:2224210

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:2224210

Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism
Conference · Thu Jan 01 00:00:00 EST 2015 · OSTI ID:2224210

Related Subjects