Evaluating Nonuniform Reduction in HIP and SYCL on GPUs

Jin, Zheming; Vetter, Jeffrey

doi:10.1109/DRBSD56682.2022.00010

Evaluating Nonuniform Reduction in HIP and SYCL on GPUs

Conference · Tue Nov 01 00:00:00 EDT 2022

DOI:https://doi.org/10.1109/DRBSD56682.2022.00010· OSTI ID:1996715

Jin, Zheming ^[1]; ^[1]

ORNL

Motivated by maturing programming models and portability for heterogeneous computing, we describe the challenges posed by hardware architectures and programming models when migrating an optimized implementation of nonuniform reduction from CUDA to HIP and SYCL. We explain the migration experience, evaluate the performance of the reduction on GPU -based computing platforms, and provide feedback on improving portability for the development of the SYCL programming model.

View Conference

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE; USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1996715

Country of Publication:: United States

Language:: English

References (15)

Fast BVH Construction on GPUs Lauterbach, C.; Garland, M.; Sengupta, S. Computer Graphics Forum, Vol. 28, Issue 2 https://doi.org/10.1111/j.1467-8659.2009.01377.x	journal	April 2009
Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs Homerding, Brian; Tramm, John Proceedings of the International Workshop on OpenCL https://doi.org/10.1145/3388333.3388660	conference	April 2020
Achieving Exascale Capabilities through Heterogeneous Computing Schulte, Michael J.; Ignatowski, Mike; Loh, Gabriel H. IEEE Micro, Vol. 35, Issue 4 https://doi.org/10.1109/MM.2015.71	journal	July 2015
Experiences Porting NAMD to the Data Parallel C++ Programming Model Hardy, David J.; Choi, Jaemin; Jiang, Wei International Workshop on OpenCL https://doi.org/10.1145/3529538.3529560	conference	May 2022
Toward exascale whole-device modeling of fusion devices: Porting the GENE gyrokinetic microturbulence code to GPU Germaschewski, K.; Allen, B.; Dannert, T. Physics of Plasmas, Vol. 28, Issue 6 https://doi.org/10.1063/5.0046327	journal	June 2021
A Fast Hybrid Approach for Stream Compaction on GPUs Rego, Vernon; Sang, Janche; Yu, Chansu 2016 Fourth International Symposium on Computing and Networking (CANDAR) https://doi.org/10.1109/CANDAR.2016.0089	conference	November 2016
Efficient stream compaction on wide SIMD many-core architectures Billeter, Markus; Olsson, Ola; Assarsson, Ulf Proceedings of the Conference on High Performance Graphics 2009 https://doi.org/10.1145/1572769.1572795	conference	August 2009
GPU‐based Collision Detection for Deformable Parameterized Surfaces Greß, Alexander; Guthe, Michael; Klein, Reinhard Computer Graphics Forum, Vol. 25, Issue 3 https://doi.org/10.1111/j.1467-8659.2006.00969.x	journal	September 2006
Parallel Computing Experiences with CUDA Garland, Michael; Le Grand, Scott; Nickolls, John IEEE Micro, Vol. 28, Issue 4 https://doi.org/10.1109/MM.2008.57	journal	July 2008
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level Gutierrez, Anthony; Beckmann, Bradford M.; Dutu, Alexandru 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2018.00058	conference	February 2018
A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware Breyer, Marcel; Van Craen, Alexander; Pflüger, Dirk International Workshop on OpenCL https://doi.org/10.1145/3529538.3529980	conference	May 2022
LLVM: A compilation framework for lifelong program analysis & transformation Lattner, C.; Adve, V. International Symposium on Code Generation and Optimization, 2004. CGO 2004. https://doi.org/10.1109/CGO.2004.1281665	conference	January 2004
Thrust: A Productivity-Oriented Library for CUDA Bell, Nathan; Hoberock, Jared GPU Computing Gems Jade Edition, p. 359-371 https://doi.org/10.1016/B978-0-12-385963-1.00026-5	book	January 2012
Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs Haseeb, Muhammad; Ding, Nan; Deslippe, Jack 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC) https://doi.org/10.1109/P3HPC54578.2021.00010	conference	November 2021
Data parallel algorithms Hillis, W. Daniel; Steele, Guy L. Communications of the ACM, Vol. 29, Issue 12 https://doi.org/10.1145/7902.7903	journal	December 1986

Similar Records

Understanding Performance Portability of SYCL Kernels: A Case Study with the All-Pairs Distance Calculation in Bioinformatics on GPUs

Conference · Mon May 01 00:00:00 EDT 2023 · OSTI ID:1996690

Case Study of Using Kokkos and SYCLs Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs

Conference · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1892057

Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs

Conference · Sun Aug 01 00:00:00 EDT 2021 · OSTI ID:1840191

Evaluating Nonuniform Reduction in HIP and SYCL on GPUs

Citation Formats

References (15)

Similar Records

Related Subjects