DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado

Abstract

Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. Here we describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
2311756
Report Number(s):
SAND-2023-10470J
Journal ID: ISSN 0098-3500
Grant/Contract Number:  
NA0003525
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Mathematical Software
Additional Journal Information:
Journal Volume: 48; Journal Issue: 4; Journal ID: ISSN 0098-3500
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; automatic differentiation; multicore; manycore; GPU; threads; CUDA; OpenMP

Citation Formats

Phipps, Eric T., Pawlowski, Roger P., and Trott, Christian Robert. Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado. United States: N. p., 2022. Web. doi:10.1145/3560262.
Phipps, Eric T., Pawlowski, Roger P., & Trott, Christian Robert. Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado. United States. https://doi.org/10.1145/3560262
Phipps, Eric T., Pawlowski, Roger P., and Trott, Christian Robert. Mon . "Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado". United States. https://doi.org/10.1145/3560262. https://www.osti.gov/servlets/purl/2311756.
@article{osti_2311756,
title = {Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado},
author = {Phipps, Eric T. and Pawlowski, Roger P. and Trott, Christian Robert},
abstractNote = {Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. Here we describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.},
doi = {10.1145/3560262},
journal = {ACM Transactions on Mathematical Software},
number = 4,
volume = 48,
place = {United States},
year = {Mon Dec 19 00:00:00 EST 2022},
month = {Mon Dec 19 00:00:00 EST 2022}
}

Works referenced in this record:

Automating Embedded Analysis Capabilities and Managing Software Complexity in Multiphysics Simulation, Part II: Application to Partial Differential Equations
journal, January 2012

  • Pawlowski, Roger P.; Phipps, Eric T.; Salinger, Andrew G.
  • Scientific Programming, Vol. 20, Issue 3
  • DOI: 10.1155/2012/818262

Evaluating Derivatives
book, January 2008


Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

  • Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.07.003

Large-Scale Transient Sensitivity Analysis of a Radiation-Damaged Bipolar Junction Transistor via Automatic Differentiation
book, January 2008


Source-to-Source Automatic Differentiation of OpenMP Parallel Loops
journal, February 2022

  • Hückelheim, Jan; Hascoët, Laurent
  • ACM Transactions on Mathematical Software, Vol. 48, Issue 1
  • DOI: 10.1145/3472796

Quantum computer aided design simulation and optimization of semiconductor quantum dots
journal, October 2013

  • Gao, X.; Nielsen, E.; Muller, R. P.
  • Journal of Applied Physics, Vol. 114, Issue 16
  • DOI: 10.1063/1.4825209

Embedded Ensemble Propagation for Improving Performance, Portability, and Scalability of Uncertainty Quantification on Emerging Computational Architectures
journal, January 2017

  • Phipps, E.; D'Elia, M.; Edwards, H. C.
  • SIAM Journal on Scientific Computing, Vol. 39, Issue 2
  • DOI: 10.1137/15M1044679

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures
conference, August 2020

  • Hückelheim, Jan; Schanen, Michel; Narayanan, Sri Hari Krishna
  • 49th International Conference on Parallel Processing - ICPP
  • DOI: 10.1145/3404397.3404470

Toward adjoinable MPI
conference, May 2009

  • Utke, Jean; Hascoet, Laurent; Heimbach, Patrick
  • Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing
  • DOI: 10.1109/IPDPS.2009.5161165

Using exact Jacobians in an implicit Newton–Krylov method
journal, December 2006


A class of OpenMP applications involving nested parallelism
conference, January 2004

  • Bücker, H. Martin; Rasch, Arno; Wolf, Andreas
  • Proceedings of the 2004 ACM symposium on Applied computing - SAC '04
  • DOI: 10.1145/967900.967948

Vc: A C++ library for explicit vectorization: VC: A C++ LIBRARY FOR EXPLICIT VECTORIZATION
journal, December 2011

  • Kretz, Matthias; Lindenstruth, Volker
  • Software: Practice and Experience, Vol. 42, Issue 11
  • DOI: 10.1002/spe.1149

A stabilized assumed deformation gradient finite element formulation for strongly coupled poromechanical simulations at finite strain: STABILIZED F-BAR FINITE ELEMENT FORMULATION FOR POROMECHANICS
journal, January 2013

  • Sun, WaiChing; Ostien, Jakob T.; Salinger, Andrew G.
  • International Journal for Numerical and Analytical Methods in Geomechanics
  • DOI: 10.1002/nag.2161

Explicit loop scheduling in OpenMP for parallel automatic differentiation
conference, January 2002

  • Bucker, H. M.; Lang, B.; Rasch, A.
  • Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications
  • DOI: 10.1109/HPCSA.2002.1019144

Scalable implicit incompressible resistive MHD with stabilized FE and fully-coupled Newton–Krylov-AMG
journal, June 2016

  • Shadid, J. N.; Pawlowski, R. P.; Cyr, E. C.
  • Computer Methods in Applied Mechanics and Engineering, Vol. 304
  • DOI: 10.1016/j.cma.2016.01.019

IMEX and exact sequence discretization of the multi-fluid plasma model
journal, November 2019


Efficient Expression Templates for Operator Overloading-Based Automatic Differentiation
book, January 2012


An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability
conference, November 2018

  • Yang, Charlene; Gayatri, Rahulkumar; Kurth, Thorsten
  • 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
  • DOI: 10.1109/P3HPC.2018.00005

MPAS-Albany Land Ice (MALI): a variable-resolution ice sheet model for Earth system modeling using Voronoi grids
journal, January 2018

  • Hoffman, Matthew J.; Perego, Mauro; Price, Stephen F.
  • Geoscientific Model Development, Vol. 11, Issue 9
  • DOI: 10.5194/gmd-11-3747-2018

Solving PDEs with Intrepid
journal, January 2012

  • Bochev, P.; Edwards, H. C.; Kirby, R. C.
  • Scientific Programming, Vol. 20, Issue 2
  • DOI: 10.1155/2012/403902

Albany/FELIX : a parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis
journal, January 2015

  • Tezaur, I. K.; Perego, M.; Salinger, A. G.
  • Geoscientific Model Development, Vol. 8, Issue 4
  • DOI: 10.5194/gmd-8-1197-2015

An overview of the Trilinos project
journal, September 2005

  • Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.
  • ACM Transactions on Mathematical Software, Vol. 31, Issue 3
  • DOI: 10.1145/1089014.1089021

Parallel simulation of compressible flow using automatic differentiation and PETSc
journal, March 2001


Bringing together automatic differentiation and OpenMP
conference, January 2001

  • Bücker, H. Martin; Lang, Bruno; an Mey, Dieter
  • Proceedings of the 15th international conference on Supercomputing - ICS '01
  • DOI: 10.1145/377792.377842

Roofline: an insightful visual performance model for multicore architectures
journal, April 2009

  • Williams, Samuel; Waterman, Andrew; Patterson, David
  • Communications of the ACM, Vol. 52, Issue 4
  • DOI: 10.1145/1498765.1498785

Automating Embedded Analysis Capabilities and Managing Software Complexity in Multiphysics Simulation, Part I: Template-Based Generic Programming
journal, January 2012

  • Pawlowski, Roger P.; Phipps, Eric T.; Salinger, Andrew G.
  • Scientific Programming, Vol. 20, Issue 2
  • DOI: 10.1155/2012/202071

Stabilized FE simulation of prototype thermal-hydraulics problems with integrated adjoint-based capabilities
journal, September 2016