DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Porting Numerical Integration Codes from CUDA to oneAPI: A Case Study

Journal Article · · Lecture Notes in Computer Science

Herein, we present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $$m$$-Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and mappings of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.

Research Organization:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP)
Grant/Contract Number:
AC02-07CH11359; AC05-06OR23177; AC02-06CH11357
OSTI ID:
1969670
Report Number(s):
FERMILAB-CONF-23-007-LDRD-SCD; arXiv:2302.05730; oai:inspirehep.net:2643005; TRN: US2313429
Journal Information:
Lecture Notes in Computer Science, Conference: 38th International Conference, ISC High Performance 2023, Hamburg (Germany), May 21-25 2023; ISSN 0302-9743
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (20)

Overview and comparison of OpenCL and CUDA technology for GPGPU conference December 2012
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
Breaking the Vendor Lock conference October 2022
Migrating CUDA to oneAPI: A Smith-Waterman Case Study book January 2022
Evaluating CUDA Portability with HIPCL and DPCT conference June 2021
A new algorithm for adaptive multidimensional integration journal May 1978
CosmoSIS: a system for MC parameter estimation journal December 2015
Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs conference November 2021
Porting Sparse Linear Algebra to Intel GPUs book January 2022
A memory efficient algorithm for adaptive multidimensional integration with multiple GPUs conference December 2013
Data Parallel C++: Enhancing SYCL Through Extensions for Productivity and Performance conference April 2020
Alpaka -- An Abstraction Library for Parallel Kernel Acceleration conference May 2016
Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing journal July 2022
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems journal May 2010
ZMCintegral: A package for multi-dimensional Monte Carlo integration on multi-GPUs journal March 2020
Porting a Legacy CUDA Stencil Code to oneAPI conference May 2020
Simulation of inverse Compton scattering and its implications on the scattered linewidth journal March 2018
m-Cubes: An Efficient and Portable Implementation of Multi-dimensional Integration for GPUs book January 2022
High Performance Implementation of Boris Particle Pusher on DPC++. A First Look at oneAPI book January 2021
Pagani
  • Sakiotis, Ioannis; Arumugam, Kamesh; Paterno, Marc
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476198
conference November 2021