Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Experiences with SYCL on AMD GPUs with Kokkos

Technical Report ·
DOI:https://doi.org/10.2172/3016989· OSTI ID:3016989
With the recent diversification of the hardware landscape in the high-performance computing (HPC) community, performance-portability solutions are becoming more and more important. One of the most popular choices is Kokkos, which recently became a Linux Foundation project. Most of its development is supported by the US Department of Energy and the French Alternative Energies and Atomic Energy Commission. Kokkos is implemented as a C++ library with multiple backends to support CPUs as well as various GPU architectures. These backends include OpenMP, CUDA, HIP, and also SCYL. This approach enables users to leverage the preferred vendor toolchain for the respective platform (e.g. CUDA, ROCm, OneAPI). The SYCL backend is used to target Intel GPUs, in particular to support the Aurora exascale supercomputer. However, SYCL itself also offers a large degree of portability, and in fact Kokkos’ CI for SYCL has been running on NVIDIA hardware due to a lack of access to Intel GPUs. In this report, we describe our experience with using Kokkos SYCL backend on AMD GPUs targeting the Frontier supercomputer at Oak Ridge National Laboratory. The two major SYCL implementations are DPC++ and AdaptiveCpp. While the Kokkos SYCL backend has been implemented using the former, the latter was the first implementation to target AMD GPUs. We will discuss the experience with both of these SYCL implementations in terms of functionality and performance. Using Kokkos to evaluate SYCL toolchains has a number of benefits. Kokkos’ use of SYCL is fairly complex, exercising features such as graphs, relocatable device functions, atomics – including for non-arithmetic types, as well as pinned and page migratable memory allocations. Kokkos also needs to implement capabilities such as Kokkos’ hierarchical parallelism that are not a straight-forward mapping to SYCL capabilities. Furthermore, a large number of libraries and applications that represent diverse use cases are implemented in Kokkos, providing readily available test cases for a toolchain evaluation. Preliminary results show that support for AMD GPUs in DPC++ is much less mature than for NVIDIA GPUs or Intel GPUs. While the situation has improved significantly over the last year, we still encounter many runtime failures, dispatching problems, and code generation issues. With AdaptiveCpp the challenges arise even earlier in the evaluation process. Since Kokkos’ SYCL implementation is largely focused on supporting Intel GPUs, we opted to leverage SYCL extensions which are available in DPC++ but not in AdaptiveCpp. Furthermore, AdaptiveCpp appears to be less conformant with the SYCL2020 standard which Kokkos relies on. In some cases, we are able to work around the lack of feature support, in other cases we have to disable certain Kokkos capabilities to evaluate the toolchain. Our evaluation will leverage Kokkos’ unit tests to establish basic functionality and feature completeness. We then use simple benchmarks for components of a CG implementation as a measure of usability and performance of the SYCL toolchains.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725;
OSTI ID:
3016989
Report Number(s):
ORNL/TM--2025/4398
Country of Publication:
United States
Language:
English