Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering

Chen, Jou-An; Sung, Hsin-Hsuan; Zhang, Ruifeng; Li, Ang; Shen, Xipeng

doi:10.1145/3710848.3710881

Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering

Conference · Fri Feb 28 04:00:00 EST 2025

DOI:https://doi.org/10.1145/3710848.3710881· OSTI ID:2524569

Chen, Jou-An ^[1]; Sung, Hsin-Hsuan; Zhang, Ruifeng; Li, Ang; Shen, Xipeng

North Carolina State University

Recent GPUs have introduced Sparse Tensor Cores (SPTC) to accelerate computations on sparse matrices meeting the N:M sparse patterns. Software tools expand the support to more general V:N:M patterns. Graphs in Graph Neural Networks (GNNs) are typically sparse, but the sparsity is often irregular, not conforming to the required V:N:M sparse patterns. This paper proposes a novel graph reordering algorithm to transform irregular graph data into the required sparse patterns for GNNs to benefit from SPTC. The optimization is lossless, maintaining the accuracy of GNN. It at the same time keeps the symmetry of the adjacency matrices of the graphs so that the same matrices can remain compatible with many symmetry-based graph algorithms. The optimization successfully removes 98-100% violations of the N:M sparse patterns at the vector level and increases the portion of conforming graphs in the SuiteSparse collection from 5-9% to 88.7-93.5%. On A100 GPUs, the optimization accelerates Sparse Matrix Matrix (SpMM) by up to 43X (a geomean speedup of 2.3X - 7.5X) over cuSPARSE and speeds up the key graph operations in GNNs on real graphs by as much as 8.6X (3.5X on average).

View Conference

Research Organization:: North Carolina State University

Sponsoring Organization:: USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Solar Energy Technologies Office

DOE Contract Number:: EE0009357

OSTI ID:: 2524569

Country of Publication:: United States

Language:: English

References (29)

Making caches work for graph analytics Zhang, Yunming; Kiriansky, Vladimir; Mendis, Charith 2017 IEEE International Conference on Big Data (Big Data) https://doi.org/10.1109/BigData.2017.8257937	conference	December 2017
SlashBurn: Graph Compression and Mining beyond Caveman Communities Lim, Yongsub; Kang, U.; Faloutsos, Christos IEEE Transactions on Knowledge and Data Engineering, Vol. 26, Issue 12 https://doi.org/10.1109/TKDE.2014.2320716	journal	December 2014
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping Zhang, Eddy Z.; Jiang, Yunlian; Guo, Ziyu Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10 https://doi.org/10.1145/1810085.1810104	conference	January 2010
Dynamic N:M Fine-Grained Structured Sparse Attention Mechanism Chen, Zhaodong; Qu, Zheng; Quan, Yuying Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3572848.3577500	conference	February 2023
Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra Davis, Timothy A. ACM Transactions on Mathematical Software, Vol. 45, Issue 4 https://doi.org/10.1145/3322125	journal	December 2019
A Closer Look at Lightweight Graph Reordering Faldu, Priyank; Diamond, Jeff; Grot, Boris 2019 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC47752.2019.9041948	conference	November 2019
Performance optimization of irregular codes based on the combination of reordering and blocking techniques Pichel, J. C.; Heras, D. B.; Cabaleiro, J. C. Parallel Computing, Vol. 31, Issue 8-9 https://doi.org/10.1016/j.parco.2005.04.012	journal	August 2005
On compressing social networks Chierichetti, Flavio; Kumar, Ravi; Lattanzi, Silvio Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09 https://doi.org/10.1145/1557019.1557049	conference	January 2009
When is Graph Reordering an Optimization? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs Balaji, Vignesh; Lucia, Brandon 2018 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2018.8573478	conference	September 2018
Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis Arai, Junya; Shiokawa, Hiroaki; Yamamuro, Takeshi 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2016.110	conference	May 2016
Bridging the gap between deep learning and sparse matrix format selection Zhao, Yue; Liao, Chunhua; Li, Jiajia Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '18 https://doi.org/10.1145/3178487.3178495	conference	January 2018
Vertex Reordering for Real-World Graphs and Applications: An Empirical Evaluation Barik, Reet; Minutoli, Marco; Halappanavar, Mahantesh 2020 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC50251.2020.00031	conference	October 2020
Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining Kang, U.; Faloutsos, Christos 2011 IEEE 11th International Conference on Data Mining https://doi.org/10.1109/ICDM.2011.26	conference	December 2011
DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores Fan, Ruibo; Wang, Wei; Chu, Xiaowen Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 https://doi.org/10.1145/3620666.3651378	conference	April 2024
Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor Core Zhang, Kaige; Liu, Xiaoyan; Yang, Hailong Proceedings of the 53rd International Conference on Parallel Processing https://doi.org/10.1145/3673038.3673108	conference	August 2024
Multiscale approach for the network compression-friendly ordering Safro, Ilya; Temkin, Boris Journal of Discrete Algorithms, Vol. 9, Issue 2 https://doi.org/10.1016/j.jda.2010.09.007	journal	June 2011
On-the-fly elimination of dynamic irregularities for GPU computing Zhang, Eddy Z.; Jiang, Yunlian; Guo, Ziyu Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems https://doi.org/10.1145/1950365.1950408	conference	March 2011
Speedup Graph Processing by Graph Ordering Wei, Hao; Yu, Jeffrey Xu; Lu, Can Proceedings of the 2016 International Conference on Management of Data https://doi.org/10.1145/2882903.2915220	conference	June 2016
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs Chen, Jou-An; Sung, Hsin-Hsuan; Shen, Xipeng Proceedings of the 37th International Conference on Supercomputing https://doi.org/10.1145/3577193.3593725	conference	June 2023
Understanding and bridging the gaps in current GNN performance optimizations Huang, Kezhao; Zhai, Jidong; Zheng, Zhen Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3437801.3441585	conference	February 2021
PANE: scalable and effective attributed network embedding Yang, Renchi; Shi, Jieming; Xiao, Xiaokui The VLDB Journal, Vol. 32, Issue 6 https://doi.org/10.1007/s00778-023-00790-4	journal	March 2023
Sparsity: Optimization Framework for Sparse Matrix Kernels Im, Eun-Jin; Yelick, Katherine; Vuduc, Richard The International Journal of High Performance Computing Applications, Vol. 18, Issue 1 https://doi.org/10.1177/1094342004041296	journal	February 2004
Graph Neural Networks for Social Recommendation Fan, Wenqi; Ma, Yao; Li, Qing The World Wide Web Conference https://doi.org/10.1145/3308558.3313488	conference	May 2019
Enabling Runtime SpMV Format Selection through an Overhead Conscious Method Zhou, Weijie; Zhao, Yue; Shen, Xipeng IEEE Transactions on Parallel and Distributed Systems, Vol. 31, Issue 1 https://doi.org/10.1109/TPDS.2019.2932931	journal	January 2020
Overhead-Conscious Format Selection for SpMV-Based Applications Zhao, Yue; Zhou, Weijie; Shen, Xipeng 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2018.00104	conference	May 2018
Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs Zheng, Da; Song, Xiang; Yang, Chengru Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining https://doi.org/10.1145/3534678.3539177	conference	August 2022
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors Sun, Wei; Li, Ang; Geng, Tong IEEE Transactions on Parallel and Distributed Systems, Vol. 34, Issue 1 https://doi.org/10.1109/TPDS.2022.3217824	journal	January 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores Castro, Roberto L.; Ivanov, Andrei; Andrade, Diego Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3581784.3607087	conference	November 2023
H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture Zhang, Chengming; Geng, Tong; Guo, Anqi 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) https://doi.org/10.1109/FPL57034.2022.00040	conference	August 2022

Similar Records

Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering

Conference · Thu Feb 27 23:00:00 EST 2025 · OSTI ID:2545648

Design Principles for Sparse Matrix Multiplication on the GPU

Conference · Mon Aug 27 00:00:00 EDT 2018 · OSTI ID:1457016

pnnl/emp-gnn

Software · Wed Feb 28 19:00:00 EST 2024 · OSTI ID:code-123162

Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering

Citation Formats

References (29)

Similar Records

Related Subjects