skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

Journal Article · · Journal of Parallel and Distributed Computing
 [1];  [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1106586
Alternate ID(s):
OSTI ID: 1556442
Report Number(s):
SAND-2013-5603J; PII: S0743731514001257
Journal Information:
Journal of Parallel and Distributed Computing, Vol. 74, Issue 12; ISSN 0743-7315
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 449 works
Citation information provided by
Web of Science

References (10)

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
  • Broquedis, Franois; Clet-Ortega, Jerome; Moreaud, Stephanie
  • 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2010.67
conference February 2010
A class of parallel tiled linear algebra algorithms for multicore architectures journal January 2009
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES journal June 2011
Kokkos Array performance-portable manycore programming model
  • Edwards, H. Carter; Sunderland, Daniel
  • Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12 https://doi.org/10.1145/2141702.2141703
conference January 2012
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
  • Gautier, Thierry; Lima, Joao V. F.; Maillard, Nicolas
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.66
conference May 2013
High Performance RDMA-Based MPI Implementation over InfiniBand journal June 2004
Loci: a rule-based framework for parallel multi-disciplinary simulation synthesis journal May 2005
Hierarchical Task-Based Programming With StarSs journal June 2009
Fast Parallel Algorithms for Short-Range Molecular Dynamics journal March 1995

Cited By (32)

Thrust2D: A new design abstraction framework for structured grid class of algorithms: Thrust2D journal July 2018
Classical molecular dynamics on graphics processing unit architectures journal August 2019
High Order Anchoring and Reinitialization of Level Set Function for Simulating Interface Motion journal November 2019
Direct simulation Monte Carlo on petaflop supercomputers and beyond journal August 2019
Large Eddy Simulation of a Supercritical Fuel Jet in Cross Flow using GPU-Acceleration conference January 2016
Evaluating Support for OpenMP Offload Features conference January 2018
Compiler Optimizations for Parallel Programs
  • Doerfert, Johannes; Finkel, Hal; Hall, Mary
  • Languages and Compilers for Parallel Computing: 31st International Workshop, LCPC 2018, Salt Lake City, UT, USA, October 9–11, 2018, Revised Selected Papers, p. 112-119 https://doi.org/10.1007/978-3-030-34627-0_9
book November 2019
Modeling of Dynamic Rock–Fluid Interaction Using Coupled 3-D Discrete Element and Lattice Boltzmann Methods journal May 2019
A large-scale study of MPI usage in open-source HPC applications
  • Laguna, Ignacio; Marshall, Ryan; Mohror, Kathryn
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356176
conference November 2019
A High-performance and Portable All-Mach Regime Flow Solver Code with Well-balanced Gravity. Application to Compressible Convection journal April 2019
Preparing sparse solvers for exascale computing
  • Anzt, Hartwig; Boman, Erik; Falgout, Rob
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0053
journal January 2020
Performance Portability of a Multiphysics Finite Element Code conference June 2018
A Study on the Performance Portability of the Finite Element Assembly Process Within the Albany Land Ice Solver book February 2020
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems journal January 2020
HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model journal January 2019
Assessing the performance portability of modern parallel programming models using TeaLeaf: Assessing the performance portability of modern parallel programming models using Tealeaf
  • Martineau, Matthew; McIntosh-Smith, Simon; Gaudin, Wayne
  • Concurrency and Computation: Practice and Experience, Vol. 29, Issue 15 https://doi.org/10.1002/cpe.4117
journal March 2017
Status and future perspectives for lattice gauge theory calculations to the exascale and beyond journal November 2019
Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication journal January 2019
InKS: a programming model to decouple algorithm from optimization in HPC codes journal July 2019
Evaluation of performance portability frameworks for the implementation of a particle‐in‐cell code journal December 2019
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond book October 2016
Tiling-Based Programming Model for Structured Grids on GPU Clusters
  • Bastem, Burak; Unat, Didem
  • HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region https://doi.org/10.1145/3368474.3368485
conference January 2020
MPAS-Albany Land Ice (MALI): a variable-resolution ice sheet model for Earth system modeling using Voronoi grids journal January 2018
Performance of preconditioned iterative solvers in MFiX–Trilinos for fluidized beds journal May 2018
Highly scalable discrete-particle simulations with novel coarse-graining: accessing the microscale text January 2018
STEEL-RT: combining single task–single executor model and expanded scheduling to ease heterogeneity exploitation journal August 2019
Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0) journal January 2018
Early Performance Evaluation of the Hybrid Cluster with Torus Interconnect Aimed at Molecular-Dynamics Simulations book January 2018
Highly scalable discrete-particle simulations with novel coarse-graining: accessing the microscale journal May 2018
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond text January 2016
Highly scalable discrete-particle simulations with novel coarse-graining: accessing the microscale text January 2018
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond text January 2016