Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado
Abstract
Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. Here we describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.
- Authors:
-
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 2311756
- Report Number(s):
- SAND-2023-10470J
Journal ID: ISSN 0098-3500
- Grant/Contract Number:
- NA0003525
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM Transactions on Mathematical Software
- Additional Journal Information:
- Journal Volume: 48; Journal Issue: 4; Journal ID: ISSN 0098-3500
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; automatic differentiation; multicore; manycore; GPU; threads; CUDA; OpenMP
Citation Formats
Phipps, Eric T., Pawlowski, Roger P., and Trott, Christian Robert. Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado. United States: N. p., 2022.
Web. doi:10.1145/3560262.
Phipps, Eric T., Pawlowski, Roger P., & Trott, Christian Robert. Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado. United States. https://doi.org/10.1145/3560262
Phipps, Eric T., Pawlowski, Roger P., and Trott, Christian Robert. Mon .
"Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado". United States. https://doi.org/10.1145/3560262. https://www.osti.gov/servlets/purl/2311756.
@article{osti_2311756,
title = {Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado},
author = {Phipps, Eric T. and Pawlowski, Roger P. and Trott, Christian Robert},
abstractNote = {Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. Here we describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.},
doi = {10.1145/3560262},
journal = {ACM Transactions on Mathematical Software},
number = 4,
volume = 48,
place = {United States},
year = {Mon Dec 19 00:00:00 EST 2022},
month = {Mon Dec 19 00:00:00 EST 2022}
}
Works referenced in this record:
Automating Embedded Analysis Capabilities and Managing Software Complexity in Multiphysics Simulation, Part II: Application to Partial Differential Equations
journal, January 2012
- Pawlowski, Roger P.; Phipps, Eric T.; Salinger, Andrew G.
- Scientific Programming, Vol. 20, Issue 3
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014
- Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
- Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
Large-Scale Transient Sensitivity Analysis of a Radiation-Damaged Bipolar Junction Transistor via Automatic Differentiation
book, January 2008
- Phipps, Eric T.; Bartlett, Roscoe A.; Gay, David M.
- Advances in Automatic Differentiation
Source-to-Source Automatic Differentiation of OpenMP Parallel Loops
journal, February 2022
- Hückelheim, Jan; Hascoët, Laurent
- ACM Transactions on Mathematical Software, Vol. 48, Issue 1
Quantum computer aided design simulation and optimization of semiconductor quantum dots
journal, October 2013
- Gao, X.; Nielsen, E.; Muller, R. P.
- Journal of Applied Physics, Vol. 114, Issue 16
Embedded Ensemble Propagation for Improving Performance, Portability, and Scalability of Uncertainty Quantification on Emerging Computational Architectures
journal, January 2017
- Phipps, E.; D'Elia, M.; Edwards, H. C.
- SIAM Journal on Scientific Computing, Vol. 39, Issue 2
Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures
conference, August 2020
- Hückelheim, Jan; Schanen, Michel; Narayanan, Sri Hari Krishna
- 49th International Conference on Parallel Processing - ICPP
Toward adjoinable MPI
conference, May 2009
- Utke, Jean; Hascoet, Laurent; Heimbach, Patrick
- Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing
Using exact Jacobians in an implicit Newton–Krylov method
journal, December 2006
- Bramkamp, F. D.; Bücker, H. M.; Rasch, A.
- Computers & Fluids, Vol. 35, Issue 10
A class of OpenMP applications involving nested parallelism
conference, January 2004
- Bücker, H. Martin; Rasch, Arno; Wolf, Andreas
- Proceedings of the 2004 ACM symposium on Applied computing - SAC '04
Vc: A C++ library for explicit vectorization: VC: A C++ LIBRARY FOR EXPLICIT VECTORIZATION
journal, December 2011
- Kretz, Matthias; Lindenstruth, Volker
- Software: Practice and Experience, Vol. 42, Issue 11
A stabilized assumed deformation gradient finite element formulation for strongly coupled poromechanical simulations at finite strain: STABILIZED F-BAR FINITE ELEMENT FORMULATION FOR POROMECHANICS
journal, January 2013
- Sun, WaiChing; Ostien, Jakob T.; Salinger, Andrew G.
- International Journal for Numerical and Analytical Methods in Geomechanics
Explicit loop scheduling in OpenMP for parallel automatic differentiation
conference, January 2002
- Bucker, H. M.; Lang, B.; Rasch, A.
- Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications
Scalable implicit incompressible resistive MHD with stabilized FE and fully-coupled Newton–Krylov-AMG
journal, June 2016
- Shadid, J. N.; Pawlowski, R. P.; Cyr, E. C.
- Computer Methods in Applied Mechanics and Engineering, Vol. 304
IMEX and exact sequence discretization of the multi-fluid plasma model
journal, November 2019
- Miller, S. T.; Cyr, E. C.; Shadid, J. N.
- Journal of Computational Physics, Vol. 397
Efficient Expression Templates for Operator Overloading-Based Automatic Differentiation
book, January 2012
- Phipps, Eric; Pawlowski, Roger
- Lecture Notes in Computational Science and Engineering
An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability
conference, November 2018
- Yang, Charlene; Gayatri, Rahulkumar; Kurth, Thorsten
- 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
MPAS-Albany Land Ice (MALI): a variable-resolution ice sheet model for Earth system modeling using Voronoi grids
journal, January 2018
- Hoffman, Matthew J.; Perego, Mauro; Price, Stephen F.
- Geoscientific Model Development, Vol. 11, Issue 9
Solving PDEs with Intrepid
journal, January 2012
- Bochev, P.; Edwards, H. C.; Kirby, R. C.
- Scientific Programming, Vol. 20, Issue 2
Albany/FELIX : a parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis
journal, January 2015
- Tezaur, I. K.; Perego, M.; Salinger, A. G.
- Geoscientific Model Development, Vol. 8, Issue 4
An overview of the Trilinos project
journal, September 2005
- Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.
- ACM Transactions on Mathematical Software, Vol. 31, Issue 3
Parallel simulation of compressible flow using automatic differentiation and PETSc
journal, March 2001
- Hovland, Paul D.; McInnes, Lois C.
- Parallel Computing, Vol. 27, Issue 4
Bringing together automatic differentiation and OpenMP
conference, January 2001
- Bücker, H. Martin; Lang, Bruno; an Mey, Dieter
- Proceedings of the 15th international conference on Supercomputing - ICS '01
Roofline: an insightful visual performance model for multicore architectures
journal, April 2009
- Williams, Samuel; Waterman, Andrew; Patterson, David
- Communications of the ACM, Vol. 52, Issue 4
Automating Embedded Analysis Capabilities and Managing Software Complexity in Multiphysics Simulation, Part I: Template-Based Generic Programming
journal, January 2012
- Pawlowski, Roger P.; Phipps, Eric T.; Salinger, Andrew G.
- Scientific Programming, Vol. 20, Issue 2
Stabilized FE simulation of prototype thermal-hydraulics problems with integrated adjoint-based capabilities
journal, September 2016
- Shadid, J. N.; Smith, T. M.; Cyr, E. C.
- Journal of Computational Physics, Vol. 321