Population Count on Intel® CPU, GPU, and FPGA
Population count is a primitive used in many applications. Commodity processors have dedicated instructions for achieving high-performance population count. Motivated by the productivity of high-level synthesis and the importance of population count, in this paper we investigated the OpenCL implementations of population count algorithms, and evaluated their performance and resource utilizations on an FPGA. Based on the results, we select the most efficient implementation. Then we derived a reduction pattern from a representative application of population count. We parallelized the reduction with atomic functions, and optimized it with vectorized memory accesses, tree reduction, and compute-unit duplication. We evaluated the performance of the reduction kernel on an Intel (R) Xeon (R) CPU and an Intel (R) IrisT Pro integrated GPU, and an FPGA card that features an Intel (R) Arria (R) 10 FPGA. When DRAM memory bandwidth is comparable on the three computing platforms, the FPGA can achieve the highest kernel performance for large workload. On the other hand, we described performance bottlenecks on the FPGA. To make FPGAs more competitive in raw performance compared to high-performant CPU and GPU platforms, it is important to increase external memory bandwidth, minimize data movement between a host and a device, and reduce OpenCL runtime overhead on an FPGA.
- Research Organization:
- Argonne National Laboratory (ANL)
- Sponsoring Organization:
- USDOE Office of Science
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1804082
- Country of Publication:
- United States
- Language:
- English
Similar Records
Evaluating LULESH Kernels on OpenCL FPGA
Evaluating and Optimizing OpenCL Base64 Data Unpacking Kernel with FPGA
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL
Conference
·
Mon Dec 31 23:00:00 EST 2018
·
OSTI ID:1528953
Evaluating and Optimizing OpenCL Base64 Data Unpacking Kernel with FPGA
Conference
·
Sun Dec 31 23:00:00 EST 2017
·
OSTI ID:1481854
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL
Technical Report
·
Tue May 23 00:00:00 EDT 2017
·
OSTI ID:1372106