skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

Abstract

The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.

Authors:
 [1];  [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1106586
Alternate Identifier(s):
OSTI ID: 1556442
Report Number(s):
SAND-2013-5603J
Journal ID: ISSN 0743-7315; PII: S0743731514001257
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 74; Journal Issue: 12; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel computing; thread parallelism; manycore; GPU; performance portability; multidimensional array; mini-application

Citation Formats

Carter Edwards, H., Trott, Christian R., and Sunderland, Daniel. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. United States: N. p., 2014. Web. doi:10.1016/j.jpdc.2014.07.003.
Carter Edwards, H., Trott, Christian R., & Sunderland, Daniel. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. United States. doi:10.1016/j.jpdc.2014.07.003.
Carter Edwards, H., Trott, Christian R., and Sunderland, Daniel. Tue . "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns". United States. doi:10.1016/j.jpdc.2014.07.003. https://www.osti.gov/servlets/purl/1106586.
@article{osti_1106586,
title = {Kokkos: Enabling manycore performance portability through polymorphic memory access patterns},
author = {Carter Edwards, H. and Trott, Christian R. and Sunderland, Daniel},
abstractNote = {The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.},
doi = {10.1016/j.jpdc.2014.07.003},
journal = {Journal of Parallel and Distributed Computing},
issn = {0743-7315},
number = 12,
volume = 74,
place = {United States},
year = {2014},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 27 works
Citation information provided by
Web of Science

Save / Share:

Works referencing / citing this record:

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication
journal, January 2019

  • Liu, Junhong; He, Xin; Liu, Weifeng
  • International Journal of Parallel Programming, Vol. 47, Issue 3
  • DOI: 10.1007/s10766-018-0604-8

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication
journal, January 2019

  • Liu, Junhong; He, Xin; Liu, Weifeng
  • International Journal of Parallel Programming, Vol. 47, Issue 3
  • DOI: 10.1007/s10766-018-0604-8

Direct simulation Monte Carlo on petaflop supercomputers and beyond
journal, August 2019

  • Plimpton, S. J.; Moore, S. G.; Borner, A.
  • Physics of Fluids, Vol. 31, Issue 8
  • DOI: 10.1063/1.5108534

Preparing sparse solvers for exascale computing
journal, January 2020

  • Anzt, Hartwig; Boman, Erik; Falgout, Rob
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
  • DOI: 10.1098/rsta.2019.0053

Status and future perspectives for lattice gauge theory calculations to the exascale and beyond
journal, November 2019

  • Joó, Bálint; Jung, Chulwoo; Christ, Norman H.
  • The European Physical Journal A, Vol. 55, Issue 11
  • DOI: 10.1140/epja/i2019-12919-7

Large Eddy Simulation of a Supercritical Fuel Jet in Cross Flow using GPU-Acceleration
conference, January 2016

  • Gottiparthi, Kalyana C.; Sankaran, Ramanan; Ruiz, Anthony M.
  • 54th AIAA Aerospace Sciences Meeting
  • DOI: 10.2514/6.2016-1939

A High-performance and Portable All-Mach Regime Flow Solver Code with Well-balanced Gravity. Application to Compressible Convection
journal, April 2019

  • Padioleau, Thomas; Tremblin, Pascal; Audit, Edouard
  • The Astrophysical Journal, Vol. 875, Issue 2
  • DOI: 10.3847/1538-4357/ab0f2c

MPAS-Albany Land Ice (MALI): a variable-resolution ice sheet model for Earth system modeling using Voronoi grids
journal, January 2018

  • Hoffman, Matthew J.; Perego, Mauro; Price, Stephen F.
  • Geoscientific Model Development, Vol. 11, Issue 9
  • DOI: 10.5194/gmd-11-3747-2018

HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model
journal, January 2019

  • Bertagna, Luca; Deakin, Michael; Guba, Oksana
  • Geoscientific Model Development, Vol. 12, Issue 4
  • DOI: 10.5194/gmd-12-1423-2019