skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Journal Article · · Journal of Cheminformatics

Abstract Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing open a path to reduce the computational cost for GCNN training effectively. However, efficient utilization of high performance computing (HPC) resources for training requires simultaneously optimizing large-scale data management and scalable stochastic batched optimization techniques. In this work, we focus on building GCNN models on HPC systems to predict material properties of millions of molecules. We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch. We use ADIOS, a high-performance data management framework for efficient storage and reading of large molecular graph data. We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap. We measure the scalability, accuracy, and convergence of our approach on two DOE supercomputers: the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) and the Perlmutter system at the National Energy Research Scientific Computing Center (NERSC). We present our experimental results with HydraGNN showing (i) reduction of data loading time up to 4.2 times compared with a conventional method and (ii) linear scaling performance for training up to 1024 GPUs on both Summit and Perlmutter.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Organization:
USDOE Office of Science (SC); USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
AC05-00OR22725; CSC457; MAT250; AC02-05CH11231; ASCR-ERCAP-m4133
OSTI ID:
1893210
Alternate ID(s):
OSTI ID: 1901638
Journal Information:
Journal of Cheminformatics, Journal Name: Journal of Cheminformatics Vol. 14 Journal Issue: 1; ISSN 1758-2946
Publisher:
Springer Science + Business MediaCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (26)

Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization journal April 2022
Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning journal August 2018
The Chemical Space Project journal February 2015
Aisd Homo-Lumo
  • Blanchard, Andrew; Gounley, John; Bhowmik, Debsindhu
  • Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States) https://doi.org/10.13139/ORNLNCCS/1869409
dataset January 2022
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis preprint January 2022
Anderson Acceleration for Distributed Training of Deep Learning Models conference March 2022
Large-batch training for LSTM and beyond
  • You, Yang; Hseu, Jonathan; Ying, Chris
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356137
conference November 2019
Spatial coupling of gyrokinetic simulations, a generalized scheme based on first-principles journal February 2021
Atomistic Line Graph Neural Network for improved materials property predictions journal November 2021
An overview of the HDF5 technology suite and its applications conference January 2011
Atomic structures and orbital energies of 61,489 crystal-forming organic molecules journal February 2020
A Data Analysis Framework for Earth System Simulation within an In-Situ Infrastructure journal January 2017
Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties journal May 2016
ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management journal July 2020
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales journal February 2022
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals journal April 2019
Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach journal August 2016
PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry journal May 2017
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties journal April 2018
Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints journal October 2020
Analytical Time-Dependent Long-Range Corrected Density Functional Tight Binding (TD-LC-DFTB) Gradients in DFTB+: Implementation and Benchmark for Excited-State Geometries and Transition Energies journal March 2021
First coupled GENE–XGC microturbulence simulations journal January 2021
Spatial core-edge coupling of the particle-in-cell gyrokinetic codes GEM and XGC journal December 2020
Multi-task graph neural networks for simultaneous prediction of global and atomic properties in ferromagnetic systems * journal May 2022