DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Modeling of molecular atomization energies using machine learning

Abstract

Atomization energies are an important measure of chemical stability. Machine learning is used to model atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. Our scheme maps the problem of solving the molecular time-independent Schrödinger equation onto a non-linear statistical regression problem. Kernel ridge regression models are trained on and compared to reference atomization energies computed using density functional theory (PBE0 approximation to Kohn-Sham level of theory). We use a diagonalized matrix representation of molecules based on the inter-nuclear Coulomb repulsion operator in conjunction with a Gaussian kernel. Validation on a set of over 7000 small organic molecules from the GDB database yields mean absolute error of ~10 kcal/mol, while reducing computational effort by several orders of magnitude. Applicability is demonstrated for prediction of binding energy curves using augmentation samples based on physical limits.

Authors:
 [1];  [2];  [1];  [3]
  1. Technical Univ. of Berlin (Germany). Machine Learning Group
  2. Max-Planck Society, Berlin (Germany). Fritz-Haber-Inst.
  3. Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1629374
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Cheminformatics
Additional Journal Information:
Journal Volume: 4; Journal Issue: S1; Conference: 7.German Conference on Chemoinformatics: 25 CIC-Workshop, Goslar (Germany), 6-8 Nov 2011; Journal ID: ISSN 1758-2946
Publisher:
Chemistry Central Ltd.
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Density Functional Theory; Machine Learning; Organic Molecule; Gaussian Kernel; Energy Curve

Citation Formats

Rupp, Matthias, Tkatchenko, Alexandre, Müller, Klaus-Robert, and von Lilienfeld, O. Anatole. Modeling of molecular atomization energies using machine learning. United States: N. p., 2012. Web. doi:10.1186/1758-2946-4-S1-P33.
Rupp, Matthias, Tkatchenko, Alexandre, Müller, Klaus-Robert, & von Lilienfeld, O. Anatole. Modeling of molecular atomization energies using machine learning. United States. https://doi.org/10.1186/1758-2946-4-S1-P33
Rupp, Matthias, Tkatchenko, Alexandre, Müller, Klaus-Robert, and von Lilienfeld, O. Anatole. Tue . "Modeling of molecular atomization energies using machine learning". United States. https://doi.org/10.1186/1758-2946-4-S1-P33. https://www.osti.gov/servlets/purl/1629374.
@article{osti_1629374,
title = {Modeling of molecular atomization energies using machine learning},
author = {Rupp, Matthias and Tkatchenko, Alexandre and Müller, Klaus-Robert and von Lilienfeld, O. Anatole},
abstractNote = {Atomization energies are an important measure of chemical stability. Machine learning is used to model atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. Our scheme maps the problem of solving the molecular time-independent Schrödinger equation onto a non-linear statistical regression problem. Kernel ridge regression models are trained on and compared to reference atomization energies computed using density functional theory (PBE0 approximation to Kohn-Sham level of theory). We use a diagonalized matrix representation of molecules based on the inter-nuclear Coulomb repulsion operator in conjunction with a Gaussian kernel. Validation on a set of over 7000 small organic molecules from the GDB database yields mean absolute error of ~10 kcal/mol, while reducing computational effort by several orders of magnitude. Applicability is demonstrated for prediction of binding energy curves using augmentation samples based on physical limits.},
doi = {10.1186/1758-2946-4-S1-P33},
journal = {Journal of Cheminformatics},
number = S1,
volume = 4,
place = {United States},
year = {Tue May 01 00:00:00 EDT 2012},
month = {Tue May 01 00:00:00 EDT 2012}
}

Works referenced in this record:

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
journal, January 2012


Rationale for mixing exact exchange with density functional approximations
journal, December 1996

  • Perdew, John P.; Ernzerhof, Matthias; Burke, Kieron
  • The Journal of Chemical Physics, Vol. 105, Issue 22, p. 9982-9985
  • DOI: 10.1063/1.472933

The Elements of Statistical Learning
book, January 2009


970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
journal, July 2009

  • Blum, Lorenz C.; Reymond, Jean-Louis
  • Journal of the American Chemical Society, Vol. 131, Issue 25
  • DOI: 10.1021/ja902302h

Inhomogeneous Electron Gas
journal, November 1964


Self-Consistent Equations Including Exchange and Correlation Effects
journal, November 1965