Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs
- Barcelona Supercomputing Center (BSC) (Spain)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11x thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2x to 4x, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- European Union’s Horizon 2020; USDOE Office of Science (SC)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1883753
- Journal Information:
- Electronics, Journal Name: Electronics Journal Issue: 9 Vol. 11; ISSN 2079-9292
- Publisher:
- MDPICopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Static Graphs for Coding Productivity in OpenACC
Nek5000 with OpenACC
OpenACC acceleration for the algorithm in Nek5000
Conference
·
Tue Nov 30 23:00:00 EST 2021
·
OSTI ID:1883754
Nek5000 with OpenACC
Journal Article
·
· Lecture Notes in Computer Science
·
OSTI ID:1567377
OpenACC acceleration for the algorithm in Nek5000
Journal Article
·
Tue May 28 20:00:00 EDT 2019
· Journal of Parallel and Distributed Computing
·
OSTI ID:1571263