Efficient Execution of Recursive Programs on Commodity Vector Hardware

Ren, Bin; Jo, Youngjoon; Krishnamoorthy, Sriram; Agrawal, Kunal; Kulkarni, Milind

doi:10.1145/2737924.2738004

Efficient Execution of Recursive Programs on Commodity Vector Hardware

Conference · Sat Jun 13 00:00:00 EDT 2015

DOI:https://doi.org/10.1145/2737924.2738004· OSTI ID:1194297

Ren, Bin; Jo, Youngjoon; Krishnamoorthy, Sriram; Agrawal, Kunal; Kulkarni, Milind

The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly-wide vector units on commodity processors and accelerators. This hardware is designed to efficiently execute data-parallel computations in a vectorized manner. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations; in the absence of data parallelism, it seems that such algorithms are not well-suited to throughput-oriented architectures. This paper presents a set of novel code transformations that expose the data-parallelism latent in recursive, task-parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel's SSE4.2 vector units as well as accelerators using Intel's AVX512 units.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (US)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1194297

Report Number(s):: PNNL-SA-107984; KJ0402000

Country of Publication:: United States

Language:: English

Similar Records

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

Journal Article · Sun Dec 01 23:00:00 EST 2019 · ACM Transactions on Parallel Computing · OSTI ID:1592696

Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

Conference · Wed Jan 25 23:00:00 EST 2017 · OSTI ID:1349171

A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining

Conference · Tue Aug 23 00:00:00 EDT 2022 · OSTI ID:1891848

Related Subjects

resursion
simd
vectorization

Efficient Execution of Recursive Programs on Commodity Vector Hardware

Citation Formats

Similar Records

Related Subjects