DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Prediction of atomization energy using graph kernel and active learning

Journal Article · · Journal of Chemical Physics
DOI: https://doi.org/10.1063/1.5078640 · OSTI ID:1526573

Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. To apply the marginalized graph kernel, a spatial adjacency rule is first employed to convert molecules into graphs whose vertices and edges are labeled by elements and interatomic distances, respectively. We then derive formulas for the efficient evaluation of the kernel. Specific functional components for the marginalized graph kernel are proposed, while the effects of the associated hyperparameters on accuracy and predictive confidence are examined. We show that the graph kernel is particularly suitable for predicting extensive properties because its convolutional structure coincides with that of the covariance formula between sums of random variables. Using an active learning procedure, we demonstrate that the proposed method can achieve a mean absolute error of 0.62 ± 0.01 kcal/mol using as few as 2000 training samples on the QM7 dataset.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1526573
Journal Information:
Journal of Chemical Physics, Journal Name: Journal of Chemical Physics Journal Issue: 4 Vol. 150; ISSN 0021-9606
Publisher:
American Institute of Physics (AIP)Copyright Statement
Country of Publication:
United States
Language:
English

References (41)

Atom-centered symmetry functions for constructing high-dimensional neural network potentials journal February 2011
Machine learning of molecular properties: Locality and active learning journal June 2018
A Linear-Time Graph Kernel conference December 2009
Graph Kernels for Molecular Similarity journal April 2010
The journey of graph kernels through two decades journal February 2018
Amp: A modular approach to machine learning in atomistic simulations journal October 2016
A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix journal June 2015
Active learning of constitutive relation from mesoscopic dynamics for macroscopic modeling of non-Newtonian flows journal June 2018
Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error journal October 2017
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space journal June 2015
Graph Kernels for Molecular Structure−Activity Relationship Analysis with Support Vector Machines journal July 2005
Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies journal July 2013
970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13 journal July 2009
Quantum-chemical insights from deep tensor neural networks journal January 2017
Accelerating materials property predictions using machine learning journal September 2013
Comparing molecules and solids across structural and alchemical space journal January 2016
ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost journal January 2017
Atom-centered symmetry functions for constructing high-dimensional neural network potentials journal February 2011
An energy decomposition analysis for intermolecular interactions from an absolutely localized molecular orbital reference at the coupled-cluster singles and doubles level journal January 2012
Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity journal October 2016
Learning molecular energies using localized graph kernels journal March 2017
Machine learning of molecular properties: Locality and active learning journal June 2018
An atomistic fingerprint algorithm for learning ab initio molecular force fields journal January 2018
Constant size descriptors for accurate machine learning models of molecular properties journal June 2018
The potential for machine learning in hybrid QM/MM calculations journal June 2018
On representing chemical environments journal May 2013
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning journal January 2012
A Linear-Time Graph Kernel conference December 2009
The Conjugate Gradient Method and Trust Regions in Large Scale Optimization journal June 1983
Extensions of marginalized graph kernels conference January 2004
ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost text January 2017
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning text January 2011
A Projected Preconditioned Conjugate Gradient Algorithm for Computing Many Extreme Eigenpairs of a Hermitian Matrix text January 2014
Comparing molecules and solids across structural and alchemical space text January 2016
Quantum-Chemical Insights from Deep Tensor Neural Networks text January 2016
Learning molecular energies using localized graph kernels text January 2016
An Atomistic Fingerprint Algorithm for Learning Ab Initio Molecular Force Fields text January 2017
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space text January 2015
Fast and accurate modeling of molecular atomization energies with machine learning text January 2012
Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity text January 2016
Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error text January 2017

Cited By (3)

Atomic structures and orbital energies of 61,489 crystal-forming organic molecules preprint January 2020
Atomic structures and orbital energies of 61,489 crystal-forming organic molecules journal February 2020
Constructing convex energy landscapes for atomistic structure optimization journal December 2019