On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

Morgan, Nathaniel Ray; Yenusah, Caleb Onuh; Diaz, Adrian; Dunning, Daniel; Moore, Jacob; Heilman, Erin Kathleen; Roth, Calvin; Lieberman, Evan; Walton, Steven Taylor; Brown, Sarah; Holladay, Daniel Alphin; Knezevic, Marko; Whetstone, Gavin; Baker, Zachary; Robey, Robert W.

doi:10.3390/info15110673

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

Journal Article · Mon Oct 28 00:00:00 EDT 2024 · Information

DOI:https://doi.org/10.3390/info15110673· OSTI ID:2475228

^[1]; ^[1]; ^[1]; ^[1]; ^[2]; ^[1]; Roth, Calvin ^[3]; ^[1]; ^[1]; Brown, Sarah ^[1]; ^[1]; Knezevic, Marko ^[4]; Whetstone, Gavin ^[5]; Baker, Zachary ^[6]; ^[7]

Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Mississippi State University, Mississippi State, MS (United States)
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); University of Minnesota, Minneapolis, MN (United States)
University of New Hampshire, Durham, NH (United States)
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Texas A & M University, College Station, TX (United States)
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); University of Colorado, Boulder, CO (United States)
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); AMD corporation, Santa Clara, CA (United States)

This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures.

View Journal Article

Research Organization:: Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)

Sponsoring Organization:: USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: 89233218CNA000001

OSTI ID:: 2475228

Alternate ID(s):: OSTI ID: 2558047

Report Number(s):: LA-UR--22-20105

Journal Information:: Information, Journal Name: Information Journal Issue: 11 Vol. 15; ISSN 2078-2489

Publisher:: MDPICopyright Statement

Country of Publication:: United States

Language:: English

References (34)

Zur kinetischen Theorie der Wärmeleitung in Kristallen Peierls, R. Annalen der Physik, Vol. 395, Issue 8 https://doi.org/10.1002/andp.19293950803	journal	January 1929
A 3D finite element ALE method using an approximate Riemann solution: 3D FINITE ELEMENT ALE METHOD Chiravalle, V. P.; Morgan, N. R. International Journal for Numerical Methods in Fluids, Vol. 83, Issue 8 https://doi.org/10.1002/fld.4284	journal	August 2016
Random Wave Closures Benney, D. J.; Newell, Alan C. Studies in Applied Mathematics, Vol. 48, Issue 1 https://doi.org/10.1002/sapm196948129	journal	March 1969
On the spectral dissipation of ocean waves due to white capping Hasselmann, Klaus Boundary-Layer Meteorology, Vol. 6, Issue 1-2 https://doi.org/10.1007/BF00232479	journal	March 1974
Weak turbulence of capillary waves Zakharov, V. E.; Filonenko, N. N. Journal of Applied Mechanics and Technical Physics, Vol. 8, Issue 5 https://doi.org/10.1007/BF00915178	journal	January 1971
A logical calculus of the ideas immanent in nervous activity McCulloch, Warren S.; Pitts, Walter The Bulletin of Mathematical Biophysics, Vol. 5, Issue 4 https://doi.org/10.1007/BF02478259	journal	December 1943
On the Energy Cascade of 3-Wave Kinetic Equations: Beyond Kolmogorov–Zakharov Solutions Soffer, Avy; Tran, Minh-Binh Communications in Mathematical Physics, Vol. 376, Issue 3 https://doi.org/10.1007/s00220-019-03651-w	journal	December 2019
Parallel 3D topology optimization with multiple constraints and objectives Diaz, Adrian; Morgan, Nathaniel; Bernardin, John Optimization and Engineering, Vol. 25, Issue 3 https://doi.org/10.1007/s11081-023-09852-6	journal	September 2023
A deep learning approximation of non-stationary solutions to wave kinetic equations Walton, Steven; Tran, Minh-Binh; Bensoussan, Alain Applied Numerical Mathematics, Vol. 199 https://doi.org/10.1016/j.apnum.2022.12.010	journal	May 2024
A fourth-order Lagrangian discontinuous Galerkin method using a hierarchical orthogonal basis on curvilinear grids Liu, Xiaodong; Morgan, Nathaniel R.; Lieberman, Evan J. Journal of Computational and Applied Mathematics, Vol. 404 https://doi.org/10.1016/j.cam.2021.113890	journal	April 2022
A higher-order Lagrangian discontinuous Galerkin hydrodynamic method for solid dynamics Lieberman, Evan J.; Liu, Xiaodong; Morgan, Nathaniel R. Computer Methods in Applied Mechanics and Engineering, Vol. 353 https://doi.org/10.1016/j.cma.2019.05.006	journal	August 2019
A cell-centered Lagrangian Godunov-like method for solid dynamics Burton, D. E.; Carney, T. C.; Morgan, N. R. Computers & Fluids, Vol. 83 https://doi.org/10.1016/j.compfluid.2012.09.008	journal	August 2013
A parallel and performance portable implementation of a full-field crystal plasticity model Yenusah, Caleb O.; Morgan, Nathaniel R.; Lebensohn, Ricardo A. Computer Physics Communications, Vol. 300 https://doi.org/10.1016/j.cpc.2024.109190	journal	July 2024
A high-order Lagrangian discontinuous Galerkin hydrodynamic method for quadratic cells using a subcell mesh stabilization scheme Liu, Xiaodong; Morgan, Nathaniel R.; Burton, Donald E. Journal of Computational Physics, Vol. 386 https://doi.org/10.1016/j.jcp.2019.02.008	journal	June 2019
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.07.003	journal	December 2014
MATAR: A performance portability and productivity implementation of data-oriented design with Kokkos Dunning, Daniel J.; Morgan, Nathaniel R.; Moore, Jacob L. Journal of Parallel and Distributed Computing, Vol. 157 https://doi.org/10.1016/j.jpdc.2021.03.016	journal	November 2021
New large-strain FFT-based formulation and its application to model strain localization in nano-metallic laminates and other strongly anisotropic crystalline materials Zecevic, Miroslav; Lebensohn, Ricardo A.; Capolungo, Laurent Mechanics of Materials, Vol. 166 https://doi.org/10.1016/j.mechmat.2021.104208	journal	March 2022
ELEMENTS: A high-order finite element library in C++ Moore, Jacob L.; Morgan, Nathaniel R.; Horstemeyer, Mark F. SoftwareX, Vol. 10 https://doi.org/10.1016/j.softx.2019.100257	journal	July 2019
On the non-linear energy transfer in a gravity-wave spectrum Part 1. General theory Hasselmann, K. Journal of Fluid Mechanics, Vol. 12, Issue 04 https://doi.org/10.1017/S0022112062000373	journal	April 1962
Collective dynamics of ‘small-world’ networks Watts, Duncan J.; Strogatz, Steven H. Nature, Vol. 393, Issue 6684 https://doi.org/10.1038/30918	journal	June 1998
A model for the global variation in oceanic depth and heat flow with lithospheric age Stein, Carol A.; Stein, Seth Nature, Vol. 359, Issue 6391 https://doi.org/10.1038/359123a0	journal	September 1992
Nonlinear interactions of random waves in a dispersive medium Benney, D. J.; Saffman, Phillip Geoffrey Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, Vol. 289, Issue 1418, p. 301-320 https://doi.org/10.1098/rspa.1966.0013	journal	January 1966
Three-dimensional direct numerical simulation of free-surface magnetohydrodynamic wave turbulence Kochurin, Evgeny; Ricard, Guillaume; Zubarev, Nikolay Physical Review E, Vol. 105, Issue 6 https://doi.org/10.1103/PhysRevE.105.L063101	journal	June 2022
LIFT: A functional data-parallel IR for high-performance GPU code generation Steuwer, Michel; Remmelg, Toomas; Dubach, Christophe 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) https://doi.org/10.1109/CGO.2017.7863730	conference	February 2017
PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14 Haidl, Michael; Gorlatch, Sergei 2014 LLVM Compiler Infrastructure in HPC https://doi.org/10.1109/LLVM-HPC.2014.9	conference	November 2014
RAJA: Portable Performance for Large-Scale Scientific Applications Beckingsale, David A.; Scogland, Thomas RW; Burmark, Jason 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) https://doi.org/10.1109/P3HPC49587.2019.00012	conference	November 2019
Incorporating Performance Portability and Data-Oriented Design in Phase-Field Modeling Yenusah, Caleb; Stone, Tonya W.; Morgan, Nathaniel R. Volume 2: 42nd Computers and Information in Engineering Conference (CIE) https://doi.org/10.1115/DETC2022-89513	conference	August 2022
SWAGE: A 3D Arbitrary-Order Element Mesh Library to Support Diverse Numerical Methods Morgan, Nathaniel R.; Moore, Jacob; Kiviaho, Jan Volume 2: 42nd Computers and Information in Engineering Conference (CIE) https://doi.org/10.1115/DETC2022-89562	conference	August 2022
Multidimensional Staggered Grid Residual Distribution Scheme for Lagrangian Hydrodynamics Abgrall, Rémi; Lipnikov, Konstantin; Morgan, Nathaniel SIAM Journal on Scientific Computing, Vol. 42, Issue 1 https://doi.org/10.1137/18M1223939	journal	January 2020
A Numerical Scheme for Wave Turbulence: 3-Wave Kinetic Equations Walton, Steven; Tran, Minh-Binh SIAM Journal on Scientific Computing, Vol. 45, Issue 4 https://doi.org/10.1137/22M1492210	journal	July 2023
Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF) Rasch, Ari; Schulze, Richard; Steuwer, Michel ACM Transactions on Architecture and Code Optimization, Vol. 18, Issue 1 https://doi.org/10.1145/3427093	journal	January 2021
Experiences with implementing Kokkos’ SYCL backend Arndt, Daniel; Lebrun-Grandie, Damien; Trott, Christian Proceedings of the 12th International Workshop on OpenCL and SYCL https://doi.org/10.1145/3648115.3648118	conference	April 2024
Algorithm 97: Shortest path Floyd, Robert W. Communications of the ACM, Vol. 5, Issue 6 https://doi.org/10.1145/367766.368168	journal	June 1962
Experiments in Surface Gravity–Capillary Wave Turbulence Falcon, Eric; Mordant, Nicolas Annual Review of Fluid Mechanics, Vol. 54, Issue 1 https://doi.org/10.1146/annurev-fluid-021021-102043	journal	January 2022

Similar Records

Enabling Parallel Performance and Portability of Solid Mechanics Simulations Across CPU and GPU Architectures

Journal Article · Wed Nov 06 19:00:00 EST 2024 · Information · OSTI ID:2476578

Related Subjects

97 MATHEMATICS AND COMPUTING
GPUs
dense and sparse data
fine-grained parallelism
performance
portability
productivity

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

Citation Formats

References (34)

Similar Records

Related Subjects