Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Exploring Integer Sum Reduction using Atomics on Intel CPU

Conference ·
OSTI ID:1515074

Atomic functions are useful in updating a shared variable by multiple threads, barrier synchronizations, constructing complex data structures, and building high-level frameworks. In this paper, we focus on the evaluation and analysis of integer sum reduction, a common data parallel primitive. We convert the sequential reduction into parallel OpenCL implementations on the CPU. We also develop three micro kernels, which allow us to understand the relationships between the kernel performance and the operations involved in reduction. The results of the micro kernels show that increasing the work-group size linearly can linearly improve the kernel performance. There is a sweet spot in the relationship between the work-group size and barrier synchronization overhead. The performance of the atomics over local memory are not sensitive to the work-group size. The sum reduction kernel with vectorized memory accesses can improve the performance of the baseline kernel for a wide range of work-group sizes. However, the vectorization efficiency shrinks with the growing work-group size. We also find that the vendor’s default OpenCL kernel optimization does not improve the kernel performance. On average, disabling the optimization can reduce the execution time of the kernel with vectorized memory accesses by 15%. We attribute the performance drop to the fact that the default kernel optimizations instantiate a large number of atomics over global memory when implicitly vectorizing the kernel computation.

Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
USDOE Office of Science
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1515074
Country of Publication:
United States
Language:
English

Similar Records

Population Count on Intel® CPU, GPU, and FPGA
Conference · Tue Dec 31 23:00:00 EST 2019 · OSTI ID:1804082

Evaluating the Performance of Integer Sum Reduction on an Intel GPU
Conference · Tue Jun 01 00:00:00 EDT 2021 · OSTI ID:1840205

Evaluating and Optimizing OpenCL Base64 Data Unpacking Kernel with FPGA
Conference · Sun Dec 31 23:00:00 EST 2017 · OSTI ID:1481854