MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs
- BATTELLE (PACIFIC NW LAB)
- Iowa State University
Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learn- ing on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15–40% improvement in end-to-end training performance on the NERSC Perlmutter supercomputer for various OGB datasets.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 2479155
- Report Number(s):
- PNNL-SA-200893
- Country of Publication:
- United States
- Language:
- English
Similar Records
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
DDStore: Distributed Data Store for Scalable Training of Graph Neural Networks on Large Atomistic Modeling Datasets
MDLoader: A Hybrid Model-Driven Data Loader for Distributed Graph Neural Network Training
Conference
·
Thu May 01 00:00:00 EDT 2025
·
OSTI ID:3002431
DDStore: Distributed Data Store for Scalable Training of Graph Neural Networks on Large Atomistic Modeling Datasets
Conference
·
Wed Nov 01 00:00:00 EDT 2023
·
OSTI ID:2251635
MDLoader: A Hybrid Model-Driven Data Loader for Distributed Graph Neural Network Training
Conference
·
Fri Nov 01 00:00:00 EDT 2024
·
OSTI ID:2538248