skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs

Journal Article · · Electronics

The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11x thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2x to 4x, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC); European Union’s Horizon 2020
Grant/Contract Number:
AC05-00OR22725; 801051
OSTI ID:
1883753
Journal Information:
Electronics, Vol. 11, Issue 9; ISSN 2079-9292
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (19)

A GPU approach for accelerating 3D deformable registration (DARTEL) on brain biomedical images conference January 2013
A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library) journal January 2019
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES journal June 2011
cuConv: CUDA implementation of convolution for CNN inference journal January 2022
Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications journal June 2017
Heterogeneous CPU+GPU approaches for mesh refinement over Lattice‐Boltzmann simulations journal August 2016
Particle swarm optimization: An overview journal August 2007
Accelerating fluid–solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures journal September 2015
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
A comparative study of GPU programming models and architectures using neural networks journal May 2011
Performance and portability of accelerated lattice Boltzmann applications with OpenACC journal May 2016
Comparing Programmer Productivity in Openacc and Cuda : An Empirical Investigation journal October 2016
Fast finite difference Poisson solvers on heterogeneous architectures journal April 2014
Many-Task Computing on Many-Core Architectures journal March 2016
Performance and Power Efficient Massive Parallel Computational Model for HPC Heterogeneous Exascale Systems journal January 2018
Accelerating Solid-fluid Interaction using Lattice-boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms journal January 2014
Multi-GPU acceleration of DARTEL (early detection of Alzheimer) conference September 2014
Multi-domain Grid Refinement for Lattice-Boltzmann Simulations on Heterogeneous Platforms conference October 2015
Static Graphs for Coding Productivity in OpenACC conference December 2021