Manycore Performance-Portability: Kokkos Multidimensional Array Library
Abstract
Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].
- Authors:
-
- Computing Research Center, Sandia National Laboratories, Livermore, CA, USA
- Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA
- Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS, USA
- Department of Mathematics, California State University, Los Angeles, CA, USA
- Publication Date:
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1197983
- Grant/Contract Number:
- AC04-94AL85000; SAND2011-8102J
- Resource Type:
- Published Article
- Journal Name:
- Scientific Programming
- Additional Journal Information:
- Journal Name: Scientific Programming Journal Volume: 20 Journal Issue: 2; Journal ID: ISSN 1058-9244
- Publisher:
- Hindawi Publishing Corporation
- Country of Publication:
- Egypt
- Language:
- English
Citation Formats
Edwards, H. Carter, Sunderland, Daniel, Porter, Vicki, Amsler, Chris, and Mish, Sam. Manycore Performance-Portability: Kokkos Multidimensional Array Library. Egypt: N. p., 2012.
Web. doi:10.1155/2012/917630.
Edwards, H. Carter, Sunderland, Daniel, Porter, Vicki, Amsler, Chris, & Mish, Sam. Manycore Performance-Portability: Kokkos Multidimensional Array Library. Egypt. https://doi.org/10.1155/2012/917630
Edwards, H. Carter, Sunderland, Daniel, Porter, Vicki, Amsler, Chris, and Mish, Sam. Sun .
"Manycore Performance-Portability: Kokkos Multidimensional Array Library". Egypt. https://doi.org/10.1155/2012/917630.
@article{osti_1197983,
title = {Manycore Performance-Portability: Kokkos Multidimensional Array Library},
author = {Edwards, H. Carter and Sunderland, Daniel and Porter, Vicki and Amsler, Chris and Mish, Sam},
abstractNote = {Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].},
doi = {10.1155/2012/917630},
journal = {Scientific Programming},
number = 2,
volume = 20,
place = {Egypt},
year = {Sun Jan 01 00:00:00 EST 2012},
month = {Sun Jan 01 00:00:00 EST 2012}
}
https://doi.org/10.1155/2012/917630
Web of Science