On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Mississippi State University, Mississippi State, MS (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); University of Minnesota, Minneapolis, MN (United States)
- University of New Hampshire, Durham, NH (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Texas A & M University, College Station, TX (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); University of Colorado, Boulder, CO (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); AMD corporation, Santa Clara, CA (United States)
This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- 89233218CNA000001
- OSTI ID:
- 2475228
- Alternate ID(s):
- OSTI ID: 2558047
- Report Number(s):
- LA-UR--22-20105
- Journal Information:
- Information, Journal Name: Information Journal Issue: 11 Vol. 15; ISSN 2078-2489
- Publisher:
- MDPICopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Enabling Parallel Performance and Portability of Solid Mechanics Simulations Across CPU and GPU Architectures
Journal Article
·
Wed Nov 06 19:00:00 EST 2024
· Information
·
OSTI ID:2476578