Reducing Communication in Graph Neural Network Training

Tripathy, Alok; Yelick, Katherine; Buluc, Aydin

doi:10.1109/sc41405.2020.00074

Reducing Communication in Graph Neural Network Training

Journal Article · Sun Nov 01 00:00:00 EDT 2020 · International Conference for High Performance Computing, Networking, Storage and Analysis

DOI:https://doi.org/10.1109/sc41405.2020.00074· OSTI ID:1772909

Tripathy, Alok ^[1]; Yelick, Katherine ^[1]; Buluc, Aydin ^[1]

Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. Here, we introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.

View Accepted Manuscript (DOE)

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: National Science Foundation (NSF); USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC02-05CH11231; AC05-00OR22725

OSTI ID:: 1772909

Alternate ID(s):: OSTI ID: 1647608

Journal Information:: International Conference for High Performance Computing, Networking, Storage and Analysis, Journal Name: International Conference for High Performance Computing, Networking, Storage and Analysis Vol. 2020; ISSN 2167-4329

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

References (17)

Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations Aktulga, Hasan Metin; Buluc, Aydin; Williams, Samuel 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/ipdps.2014.125	conference	May 2014
A Comprehensive Survey on Graph Neural Networks Wu, Zonghan; Pan, Shirui; Chen, Fengwen IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, Issue 1 https://doi.org/10.1109/tnnls.2020.2978386	journal	January 2021
SUMMA: scalable universal matrix multiplication algorithm Van De Geijn, R. A.; Watts, J. Concurrency: Practice and Experience, Vol. 9, Issue 4 https://doi.org/10.1002/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2	journal	April 1997
Collective communication: theory, practice, and experience Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13 https://doi.org/10.1002/cpe.1206	journal	January 2007
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A. Nucleic Acids Research, Vol. 46, Issue 6 https://doi.org/10.1093/nar/gkx1313	journal	January 2018
On the representation and multiplication of hypersparse matrices Buluc, Aydin; Gilbert, John R. Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536313	conference	April 2008
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism Dryden, Nikoli; Maruyama, Naoya; Benson, Tom 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2019.00031	conference	May 2019
The Graph Neural Network Model Scarselli, F.; Gori, M.; Tsoi, Ah Chung IEEE Transactions on Neural Networks, Vol. 20, Issue 1, p. 61-80 https://doi.org/10.1109/TNN.2008.2005605	journal	December 2008
A Comprehensive Survey on Graph Neural Networks Wu, Zonghan; Pan, Shirui; Chen, Fengwen IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, Issue 1 https://doi.org/10.1109/TNNLS.2020.2978386	journal	January 2021
Minimizing Communication in Numerical Linear Algebra Ballard, Grey; Demmel, James; Holtz, Olga SIAM Journal on Matrix Analysis and Applications, Vol. 32, Issue 3 https://doi.org/10.1137/090769156	journal	July 2011
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication Azad, Ariful; Ballard, Grey; Buluç, Aydin SIAM Journal on Scientific Computing, Vol. 38, Issue 6 https://doi.org/10.1137/15M104253X	journal	January 2016
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks Gholami, Amir; Azad, Ariful; Jin, Peter Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures https://doi.org/10.1145/3210377.3210394	conference	July 2018
Channel and filter parallelism for large-scale CNN training Dryden, Nikoli; Maruyama, Naoya; Moon, Tim Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356207	conference	November 2019
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis Ben-Nun, Tal; Hoefler, Torsten ACM Computing Surveys, Vol. 52, Issue 4 https://doi.org/10.1145/3320060	journal	August 2019
A three-dimensional approach to parallel matrix multiplication Agarwal, R. C.; Balle, S. M.; Gustavson, F. G. IBM Journal of Research and Development, Vol. 39, Issue 5 https://doi.org/10.1147/rd.395.0575	journal	September 1995
The Combinatorial BLAS: design, implementation, and applications Buluç, Aydın; Gilbert, John R. The International Journal of High Performance Computing Applications, Vol. 25, Issue 4 https://doi.org/10.1177/1094342011403516	journal	May 2011
AliGraph Zhu, Rong; Zhao, Kun; Yang, Hongxia Proceedings of the VLDB Endowment, Vol. 12, Issue 12 https://doi.org/10.14778/3352063.3352127	journal	August 2019

Similar Records

Reducing Communication in Graph Neural Network Training

Conference · Sun Nov 01 00:00:00 EDT 2020 · SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis · OSTI ID:1647608

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Conference · Thu May 01 00:00:00 EDT 2025 · OSTI ID:3002431

Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN

Journal Article · Thu Mar 13 20:00:00 EDT 2025 · Journal of Supercomputing · OSTI ID:2538215

Related Subjects

97 MATHEMATICS AND COMPUTING
communication-avoiding algorithms
distributed training
graph neural networks

Reducing Communication in Graph Neural Network Training

Citation Formats

References (17)

Similar Records

Related Subjects