Experiences with SYCL on AMD GPUs with Kokkos

Arndt, Daniel; Lebrun-Grandie, Damien; Trott, Christian

doi:10.2172/3016989

Experiences with SYCL on AMD GPUs with Kokkos

Technical Report · Mon Dec 01 00:00:00 EST 2025

DOI:https://doi.org/10.2172/3016989· OSTI ID:3016989

^[1]; ^[1]; Trott, Christian ^[2]

Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

With the recent diversification of the hardware landscape in the high-performance computing (HPC) community, performance-portability solutions are becoming more and more important. One of the most popular choices is Kokkos, which recently became a Linux Foundation project. Most of its development is supported by the US Department of Energy and the French Alternative Energies and Atomic Energy Commission. Kokkos is implemented as a C++ library with multiple backends to support CPUs as well as various GPU architectures. These backends include OpenMP, CUDA, HIP, and also SCYL. This approach enables users to leverage the preferred vendor toolchain for the respective platform (e.g. CUDA, ROCm, OneAPI). The SYCL backend is used to target Intel GPUs, in particular to support the Aurora exascale supercomputer. However, SYCL itself also offers a large degree of portability, and in fact Kokkos’ CI for SYCL has been running on NVIDIA hardware due to a lack of access to Intel GPUs. In this report, we describe our experience with using Kokkos SYCL backend on AMD GPUs targeting the Frontier supercomputer at Oak Ridge National Laboratory. The two major SYCL implementations are DPC++ and AdaptiveCpp. While the Kokkos SYCL backend has been implemented using the former, the latter was the first implementation to target AMD GPUs. We will discuss the experience with both of these SYCL implementations in terms of functionality and performance. Using Kokkos to evaluate SYCL toolchains has a number of benefits. Kokkos’ use of SYCL is fairly complex, exercising features such as graphs, relocatable device functions, atomics – including for non-arithmetic types, as well as pinned and page migratable memory allocations. Kokkos also needs to implement capabilities such as Kokkos’ hierarchical parallelism that are not a straight-forward mapping to SYCL capabilities. Furthermore, a large number of libraries and applications that represent diverse use cases are implemented in Kokkos, providing readily available test cases for a toolchain evaluation. Preliminary results show that support for AMD GPUs in DPC++ is much less mature than for NVIDIA GPUs or Intel GPUs. While the situation has improved significantly over the last year, we still encounter many runtime failures, dispatching problems, and code generation issues. With AdaptiveCpp the challenges arise even earlier in the evaluation process. Since Kokkos’ SYCL implementation is largely focused on supporting Intel GPUs, we opted to leverage SYCL extensions which are available in DPC++ but not in AdaptiveCpp. Furthermore, AdaptiveCpp appears to be less conformant with the SYCL2020 standard which Kokkos relies on. In some cases, we are able to work around the lack of feature support, in other cases we have to disable certain Kokkos capabilities to evaluate the toolchain. Our evaluation will leverage Kokkos’ unit tests to establish basic functionality and feature completeness. We then use simple benchmarks for components of a CG implementation as a measure of usability and performance of the SYCL toolchains.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725;

OSTI ID:: 3016989

Report Number(s):: ORNL/TM--2025/4398

Country of Publication:: United States

Language:: English

Similar Records

Case Study of Using Kokkos and SYCLs Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs

Conference · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1892057

Experiences with implementing Kokkos’ SYCL backend

Conference · Mon Apr 01 00:00:00 EDT 2024 · OSTI ID:2336667

Understanding Performance Portability of SYCL Kernels: A Case Study with the All-Pairs Distance Calculation in Bioinformatics on GPUs

Conference · Mon May 01 00:00:00 EDT 2023 · OSTI ID:1996690

Related Subjects

97 MATHEMATICS AND COMPUTING

Experiences with SYCL on AMD GPUs with Kokkos

Citation Formats

Similar Records

Related Subjects