MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Sarkar, Aishwarya; Ghosh, Sayan; Tallent, Nathan R.; Jannesari, Ali

doi:10.1109/CLUSTER59578.2024.00013

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Conference · Thu Nov 07 04:00:00 EST 2024

DOI:https://doi.org/10.1109/CLUSTER59578.2024.00013· OSTI ID:2479155

Sarkar, Aishwarya ^[1]; Ghosh, Sayan ^[1]; ^[1]; Jannesari, Ali ^[2]

BATTELLE (PACIFIC NW LAB)
Iowa State University

Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learn- ing on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15–40% improvement in end-to-end training performance on the NERSC Perlmutter supercomputer for various OGB datasets.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 2479155

Report Number(s):: PNNL-SA-200893

Country of Publication:: United States

Language:: English

Similar Records

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Conference · Thu May 01 00:00:00 EDT 2025 · OSTI ID:3002431

DDStore: Distributed Data Store for Scalable Training of Graph Neural Networks on Large Atomistic Modeling Datasets

Conference · Wed Nov 01 00:00:00 EDT 2023 · OSTI ID:2251635

MDLoader: A Hybrid Model-Driven Data Loader for Distributed Graph Neural Network Training

Conference · Fri Nov 01 00:00:00 EDT 2024 · OSTI ID:2538248

Related Subjects

Machine Leanring
high performance comptuing
graph neural networks

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Citation Formats

Similar Records

Related Subjects