Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

Chen, Jou-An; Sung, Hsin-Hsuan; Shen, Xipeng; Tallent, Nathan R.; Barker, Kevin J.; Li, Ang

doi:10.1016/j.jpdc.2023.02.013

Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

Journal Article · Fri Mar 03 23:00:00 EST 2023 · Journal of Parallel and Distributed Computing

DOI:https://doi.org/10.1016/j.jpdc.2023.02.013· OSTI ID:1968852

Chen, Jou-An ^[1]; Sung, Hsin-Hsuan ^[1]; Shen, Xipeng ^[1]; ^[2]; Barker, Kevin J. ^[2]; Li, Ang ^[2]

North Carolina State Univ., Raleigh, NC (United States)
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Even though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has not yet been adequately explored. This paper presents a systematic study on how to unlock the potential of the bit-level optimizations of graph computations that involve binary values. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by the intrinsics of modern GPUs. It additionally introduces Deep Reinforcement Learning (DRL) as an efficient way to best configure the bit-level optimizations on the fly. Additionally, the DQN-based adaptive tile size selector with dedicated model training can reach 68% prediction accuracy. Evaluations on NVIDIA Pascal and Volta GPUs show that the optimizations bring up to 40× and 6555× for essential GraphBLAS kernels SpMV and SpGEMM, respectively, making GraphBLAS-based BFS accelerate up to 433×, SSSP, PR, and CC up to 35×, and TC up to 52×.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: National Science Foundation (NSF); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC05-76RL01830; SC0021293; EE0009357

OSTI ID:: 1968852

Alternate ID(s):: OSTI ID: 1962681

Report Number(s):: PNNL-SA-179122

Journal Information:: Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Vol. 177; ISSN 0743-7315

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

References (7)

Accelerating sparse matrix–matrix multiplication with GPU Tensor Cores Zachariadis, Orestis; Satpute, Nitin; Gómez-Luna, Juan Computers & Electrical Engineering, Vol. 88 https://doi.org/10.1016/j.compeleceng.2020.106848	journal	December 2020
High Performance Exact Triangle Counting on GPUs Bisson, Mauro; Fatica, Massimiliano IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 12 https://doi.org/10.1109/TPDS.2017.2735405	journal	December 2017
Enabling Runtime SpMV Format Selection through an Overhead Conscious Method Zhou, Weijie; Zhao, Yue; Shen, Xipeng IEEE Transactions on Parallel and Distributed Systems, Vol. 31, Issue 1 https://doi.org/10.1109/TPDS.2019.2932931	journal	January 2020
Scalable GPU graph traversal Merrill, Duane; Garland, Michael; Grimshaw, Andrew ACM SIGPLAN Notices, Vol. 47, Issue 8 https://doi.org/10.1145/2370036.2145832	journal	September 2012
Thinking Like a Vertex McCune, Robert Ryan; Weninger, Tim; Madey, Greg ACM Computing Surveys, Vol. 48, Issue 2 https://doi.org/10.1145/2818185	journal	October 2015
GraphIt: a high-performance graph DSL Zhang, Yunming; Yang, Mengjiao; Baghdadi, Riyadh Proceedings of the ACM on Programming Languages, Vol. 2, Issue OOPSLA https://doi.org/10.1145/3276491	journal	October 2018
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU Yang, Carl; Buluç, Aydın; Owens, John D. ACM Transactions on Mathematical Software, Vol. 48, Issue 1 https://doi.org/10.1145/3466795	journal	February 2022

Similar Records

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

Conference · Fri Jul 15 00:00:00 EDT 2022 · OSTI ID:1888819

A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

Conference · Fri Feb 15 23:00:00 EST 2019 · OSTI ID:1765323

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM

Conference · Thu Nov 30 23:00:00 EST 2017 · OSTI ID:1427708

Related Subjects

79 ASTRONOMY AND ASTROPHYSICS
Bit manipulation
Deep reinforcement learning
GPU
GraphBLAS
Sparse matrix

Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

Citation Formats

References (7)

Similar Records

Related Subjects