Matrix Multiply Performance of GPUs on Exascale-class HPE/Cray Systems
- ORNL
The computation of dense matrix-matrix products (GEMMs) is central to many modeling and simulation workloads as well as AI/ML deep learning campaigns. In fact, millions of dollars are spent annually on computing GEMMs, and large model training demands are increasing exponentially. Specialized processors such as GPUs are designed to perform well for these operations. However, the performance of GEMMs on GPUs can exhibit complex behaviors depending on many factors, making it challenging to optimize the performance of GEMMs on these processors. In this study we undertake an examination of GEMM performance on several leading GPU models taken from product lines of GPUs to be deployed in forthcoming exascale computing systems. We show results to illustrate the many factors that can affect performance of GEMMs on GPUs. We then present data collected from a large number of test runs for an example GEMM operation to show the dependence behaviors of GEMM rate on matrix dimensions. Finally, we show results from machine learning-based performance models using novel feature engineering methods to fit the measured performance, providing a potential basis for GEMM performance tuning and autotuning methods for GPUs. Recommendations are also given for how to achieve high GEMM performance on modern GPUs.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 2224210
- Resource Relation:
- Conference: Cray User Group CUG 2022 - Monterey, California, United States of America - 5/2/2022 4:00:00 AM-5/5/2022 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism