Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Case Study of Using Kokkos and SYCLs Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs

Conference ·

Six of the top ten supercomputers in the TOP500 list from June 2021 rely on NVIDIA GPUs to achieve their peak compute bandwidth. With the announcement of Aurora, Frontier, and El Capitan, Intel and AMD have also entered the domain of providing GPUs for scientific computing. A consequence of the increased diversity in the GPU landscape is the emergence of portable programming models such as Kokkos, SYCL, OpenCL, and OpenMP, which allow application developers to maintain a single-source code across a diverse range of hardware architectures. While the portable frameworks try to optimize the compute resource usage on a given architecture, it is the programmers responsibility to expose parallelism in an application that can take advantage of thousands of processing elements available on GPUs. In this paper, we introduce a GPU-friendly parallel implementation of Milc-Dslash that exposes multiple hierarchies of parallelism in the algorithm. Milc-Dslash was designed to serve as a benchmark with highly optimized matrix-vector multiplications to measure the resource utilization on the GPU systems. The parallel hierarchies in the Milc-Dslash algorithm are mapped onto a target hardware using Kokkos and SYCL programming models. We present the performance achieved by Kokkos and SYCL implementations of Milc-Dslash on NVIDIA A100 GPU, AMD MI100 GPU, and Intel Gen9 GPU. Additionally, we compare the Kokkos and SYCL performances with those obtained from the versions written in CUDA and HIP programming models on NVIDIA A100 GPU and AMD MI100 GPU, respectively.

Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
USDOE Office of Science; Argonne National Laboratory - Argonne Leadership Computing Facility
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1892057
Country of Publication:
United States
Language:
English

References (10)

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL book November 2020
Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs conference April 2020
Performance Portability of a Wilson Dslash Stencil Operator Mini-App Using Kokkos and SYCL conference November 2019
SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL conference April 2020
The Ongoing Evolution of OpenMP journal November 2018
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
Evaluating the Performance and Portability of Contemporary SYCL Implementations conference November 2020
Evaluating the performance of HPC-style SYCL applications conference April 2020
Early experiments using SYCL single-source modern C++ on Xilinx FPGA conference May 2018
A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications conference May 2019

Similar Records

A Case Study with the HACCmk Kernel in SYCL
Technical Report · Sat Nov 30 23:00:00 EST 2019 · OSTI ID:1576562

HIPLZ: Enabling performance portability for exascale systems
Journal Article · Mon Jul 17 00:00:00 EDT 2023 · Concurrency and Computation. Practice and Experience · OSTI ID:2279004

Making Uintah Performance Portable for Department of Energy Exascale Testbeds
Conference · Mon Apr 01 00:00:00 EDT 2024 · OSTI ID:2345338

Related Subjects