DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient GPU Implementation of Automatic Differentiation for Computational Fluid Dynamics

Journal Article · · Proceedings ... International Conference on High Performance Computing (Online)
 [1];  [1];  [2];  [2];  [2];  [3];  [4];  [5];  [6]
  1. Old Dominion Univ., Norfolk, VA (United States)
  2. NASA Langley Research Center, Hampton, VA (United States)
  3. National Institute of Aerospace, Hampton, VA (United States)
  4. Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
  5. Northwestern Univ., Evanston, IL (United States)
  6. Univ. of Maryland, College Park, MD (United States)

Many scientific and engineering applications require repeated calculations of derivatives of output functions with respect to input parameters. Automatic Differentiation (AD) is a method that automates derivative calculations and can significantly speed up code development. In Computational Fluid Dynamics (CFD), derivatives of flux functions with respect to state variables (Jacobian) are needed for efficient solutions of the nonlinear governing equations. AD of flux functions on graphics processing units (GPUs) is challenging as flux computations involve many intermediate variables that create high register pressure and require significant memory traffic because of the need to store the derivatives. This paper presents a forward-mode AD method based on multivariate dual numbers that addresses these challenges and simultaneously reduces the floating-point operation count. The dimension of the multivariate dual numbers is optimized for performance. The flux computations are restructured to minimize the number of temporary variables and reduce register pressure. For effective utilization of memory bandwidth, shared memory is used to store the local flux Jacobian. This AD implementation is compared with several other Jacobian implementations on an NVIDIA V100 GPU (V100). For three-dimensional perfect-gas compressible-flow equations implemented in a practical CFD code, the AD implementation of a flux Jacobian based on multivariate dual numbers of dimension 5 outperforms all other GPU AD implementations on V100. Its performance is comparable with the optimized hand-differentiated version. Finally, the implementation achieves 75% of the peak floating-point throughput and 61 % of the peak global device memory bandwidth usage.

Research Organization:
Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States); Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP); National Institute of Aerospace
Grant/Contract Number:
AC02-07CH11359; AC05-00OR22725
OSTI ID:
1993463
Report Number(s):
FERMILAB-CONF--23-342-CSAID; oai:inspirehep.net:2679722
Journal Information:
Proceedings ... International Conference on High Performance Computing (Online), Journal Name: Proceedings ... International Conference on High Performance Computing (Online) Vol. 2023; ISSN 2640-0316
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (20)

Automatic Differentiation for Adjoint Stencil Loops conference August 2019
AutoMat: automatic differentiation for generalized standard materials on GPUs journal November 2021
Numerical Differentiation of Analytic Functions journal June 1967
A Bibliography of Automatic Differentiation book January 2006
Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation journal March 2000
Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures conference August 2020
Iterative Methods for Sparse Linear Systems book January 2003
GPU Accelerated Automatic Differentiation With Clad journal February 2023
Reverse-mode automatic differentiation and optimization of GPU kernels via enzyme
  • Moses, William S.; Churavy, Valentin; Paehler, Ludger
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476165
conference November 2021
Fast Reverse-Mode Automatic Differentiation using Expression Templates in C++ journal June 2014
Discrete Adjoint-Based Design for Unsteady Turbulent Flows on Dynamic Overset Unstructured Grids journal June 2013
Numerical algorithms based on the theory of complex variable conference January 1967
Algorithmic Differentiation of Numerical Methods journal October 2015
Implementation of automatic differentiation tools journal January 2002
Kokkos 3: Programming Model Extensions for the Exascale Era journal January 2021
Data Summary from Second AIAA Computational Fluid Dynamics Drag Prediction Workshop journal September 2005
LLVM code optimisation for automatic differentiation conference June 2022
Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado journal December 2022
Approximate Riemann solvers, parameter vectors, and difference schemes journal October 1981
The Tapenade automatic differentiation tool: Principles, model, and specification journal April 2013