Reducing Communication in Graph Neural Network Training
Journal Article
·
· International Conference for High Performance Computing, Networking, Storage and Analysis
- Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. Here, we introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- National Science Foundation (NSF); USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-05CH11231; AC05-00OR22725
- OSTI ID:
- 1772909
- Alternate ID(s):
- OSTI ID: 1647608
- Journal Information:
- International Conference for High Performance Computing, Networking, Storage and Analysis, Journal Name: International Conference for High Performance Computing, Networking, Storage and Analysis Vol. 2020; ISSN 2167-4329
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Reducing Communication in Graph Neural Network Training
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN
Conference
·
Sun Nov 01 00:00:00 EDT 2020
· SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
·
OSTI ID:1647608
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Conference
·
Thu May 01 00:00:00 EDT 2025
·
OSTI ID:3002431
Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN
Journal Article
·
Thu Mar 13 20:00:00 EDT 2025
· Journal of Supercomputing
·
OSTI ID:2538215