Exploring Integer Sum Reduction using Atomics on Intel CPU

Jin, Zheming; Finkel, Hal

Exploring Integer Sum Reduction using Atomics on Intel CPU

Conference · Mon May 13 00:00:00 EDT 2019

OSTI ID:1515074

Jin, Zheming; Finkel, Hal

Atomic functions are useful in updating a shared variable by multiple threads, barrier synchronizations, constructing complex data structures, and building high-level frameworks. In this paper, we focus on the evaluation and analysis of integer sum reduction, a common data parallel primitive. We convert the sequential reduction into parallel OpenCL implementations on the CPU. We also develop three micro kernels, which allow us to understand the relationships between the kernel performance and the operations involved in reduction. The results of the micro kernels show that increasing the work-group size linearly can linearly improve the kernel performance. There is a sweet spot in the relationship between the work-group size and barrier synchronization overhead. The performance of the atomics over local memory are not sensitive to the work-group size. The sum reduction kernel with vectorized memory accesses can improve the performance of the baseline kernel for a wide range of work-group sizes. However, the vectorization efficiency shrinks with the growing work-group size. We also find that the vendor’s default OpenCL kernel optimization does not improve the kernel performance. On average, disabling the optimization can reduce the execution time of the kernel with vectorized memory accesses by 15%. We attribute the performance drop to the fact that the default kernel optimizations instantiate a large number of atomics over global memory when implicitly vectorizing the kernel computation.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Argonne National Laboratory (ANL)

Sponsoring Organization:: USDOE Office of Science

DOE Contract Number:: AC02-06CH11357

OSTI ID:: 1515074

Country of Publication:: United States

Language:: English

Similar Records

Population Count on Intel® CPU, GPU, and FPGA

Conference · Tue Dec 31 23:00:00 EST 2019 · OSTI ID:1804082

Evaluating the Performance of Integer Sum Reduction on an Intel GPU

Conference · Tue Jun 01 00:00:00 EDT 2021 · OSTI ID:1840205

Evaluating and Optimizing OpenCL Base64 Data Unpacking Kernel with FPGA

Conference · Sun Dec 31 23:00:00 EST 2017 · OSTI ID:1481854

Related Subjects

CPU
OpenCL
atomics
sum reduction
vectorization

Exploring Integer Sum Reduction using Atomics on Intel CPU

Citation Formats

Similar Records

Related Subjects