The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.
Carter Edwards, H., Trott, Christian R., & Sunderland, Daniel (2014). Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 74(12). https://doi.org/10.1016/j.jpdc.2014.07.003
Carter Edwards, H., Trott, Christian R., and Sunderland, Daniel, "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns," Journal of Parallel and Distributed Computing 74, no. 12 (2014), https://doi.org/10.1016/j.jpdc.2014.07.003
@article{osti_1106586,
author = {Carter Edwards, H. and Trott, Christian R. and Sunderland, Daniel},
title = {Kokkos: Enabling manycore performance portability through polymorphic memory access patterns},
annote = {The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.},
doi = {10.1016/j.jpdc.2014.07.003},
url = {https://www.osti.gov/biblio/1106586},
journal = {Journal of Parallel and Distributed Computing},
issn = {ISSN 0743-7315},
number = {12},
volume = {74},
place = {United States},
publisher = {Elsevier},
year = {2014},
month = {07}}
18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processinghttps://doi.org/10.1109/PDP.2010.67
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12https://doi.org/10.1145/2141702.2141703
Gautier, Thierry; Lima, Joao V. F.; Maillard, Nicolas
2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processinghttps://doi.org/10.1109/IPDPS.2013.66
Languages and Compilers for Parallel Computing: 31st International Workshop, LCPC 2018, Salt Lake City, UT, USA, October 9–11, 2018, Revised Selected Papers, p. 112-119https://doi.org/10.1007/978-3-030-34627-0_9
SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysishttps://doi.org/10.1145/3295500.3356176
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166https://doi.org/10.1098/rsta.2019.0053
HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Regionhttps://doi.org/10.1145/3368474.3368485