DDStore: Distributed Data Store for Scalable Training of Graph Neural Networks on Large Atomistic Modeling Datasets
- ORNL
- Lawrence Berkeley National Laboratory (LBNL)
Graph neural networks (GNNs) are a class of Deep Learning models used in designing atomistic materials for effective screening of large chemical spaces. To ensure robust prediction, GNN models must be trained on large volumes of atomistic data on leadership class supercomputers. Even with the advent of modern architectures that consist of multiple storage layers that include node-local NVMe devices in addition to device memory for caching large datasets, extreme-scale model training faces I/O challenges at scale.We present DDStore, an in-memory distributed data store designed for GNN training on large-scale graph data. DDStore provides a hierarchical, distributed, data caching technique that combines data chunking, replication, low-latency random access, and high throughput communication. DDStore achieves near-linear scaling for training a GNN model using up to 1000 GPUs on the Summit and Perlmutter supercomputers, and reaches up to a 6.15x reduction in GNN training time compared to state-of-the-art methodologies.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 2251635
- Country of Publication:
- United States
- Language:
- English
Similar Records
Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN
Scaling Laws of Graph Neural Networks for Atomistic Materials Modeling
Journal Article
·
Thu Mar 13 20:00:00 EDT 2025
· Journal of Supercomputing
·
OSTI ID:2538215
Scaling Laws of Graph Neural Networks for Atomistic Materials Modeling
Conference
·
Sat May 31 20:00:00 EDT 2025
·
OSTI ID:3017033