GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Lim, Seung-Hwan

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Conference · Thu May 01 04:00:00 EDT 2025

OSTI ID:3002431

^[1]

ORNL

Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch training is commonly used to train GNNs on large graphs, and data parallelism is the standard approach to scale mini-batch training across multiple GPUs. Data parallel approaches contain redundant work as subgraphs sampled by different GPUs contain significant overlap. To address this issue, we introduce a hybrid parallel mini-batch training paradigm called Split parallelism. Split parallelism avoids redundant work by splitting the sampling, loading, and training of each mini-batch across multiple GPUs. Split parallelism, however, introduces communication overheads that can be more than the savings from removing redundant work. We further present a lightweight partitioning algorithm that probabilistically minimizes these overheads. We implement spllit parllelism in GSplit and show that it outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and P3.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 3002431

Country of Publication:: United States

Language:: English

Similar Records

Reducing Communication in Graph Neural Network Training

Conference · Sun Nov 01 00:00:00 EDT 2020 · SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis · OSTI ID:1647608

Reducing Communication in Graph Neural Network Training

Journal Article · Sat Oct 31 20:00:00 EDT 2020 · International Conference for High Performance Computing, Networking, Storage and Analysis · OSTI ID:1772909

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Conference · Wed Nov 06 23:00:00 EST 2024 · OSTI ID:2479155

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Citation Formats

Similar Records

Related Subjects