DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning of molecular properties: Locality and active learning

Abstract

In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

Authors:
ORCiD logo; ORCiD logo;
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1433745
Grant/Contract Number:  
1150-06_2015
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Journal of Chemical Physics
Additional Journal Information:
Journal Name: Journal of Chemical Physics Journal Volume: 148 Journal Issue: 24; Journal ID: ISSN 0021-9606
Publisher:
American Institute of Physics
Country of Publication:
United States
Language:
English

Citation Formats

Gubaev, Konstantin, Podryabinkin, Evgeny V., and Shapeev, Alexander V. Machine learning of molecular properties: Locality and active learning. United States: N. p., 2018. Web. doi:10.1063/1.5005095.
Gubaev, Konstantin, Podryabinkin, Evgeny V., & Shapeev, Alexander V. Machine learning of molecular properties: Locality and active learning. United States. https://doi.org/10.1063/1.5005095
Gubaev, Konstantin, Podryabinkin, Evgeny V., and Shapeev, Alexander V. Thu . "Machine learning of molecular properties: Locality and active learning". United States. https://doi.org/10.1063/1.5005095.
@article{osti_1433745,
title = {Machine learning of molecular properties: Locality and active learning},
author = {Gubaev, Konstantin and Podryabinkin, Evgeny V. and Shapeev, Alexander V.},
abstractNote = {In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.},
doi = {10.1063/1.5005095},
journal = {Journal of Chemical Physics},
number = 24,
volume = 148,
place = {United States},
year = {Thu Apr 19 00:00:00 EDT 2018},
month = {Thu Apr 19 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1063/1.5005095

Citation Metrics:
Cited by: 101 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Machine learning of molecular electronic properties in chemical compound space
journal, September 2013


Quantum-chemical insights from deep tensor neural networks
journal, January 2017

  • Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan
  • Nature Communications, Vol. 8, Issue 1
  • DOI: 10.1038/ncomms13890

Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties
journal, March 2017

  • Browning, Nicholas J.; Ramakrishnan, Raghunathan; von Lilienfeld, O. Anatole
  • The Journal of Physical Chemistry Letters, Vol. 8, Issue 7
  • DOI: 10.1021/acs.jpclett.7b00038

Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
journal, April 2015

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Journal of Chemical Theory and Computation, Vol. 11, Issue 5
  • DOI: 10.1021/acs.jctc.5b00099

Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity
journal, October 2016

  • Huang, Bing; von Lilienfeld, O. Anatole
  • The Journal of Chemical Physics, Vol. 145, Issue 16
  • DOI: 10.1063/1.4964627

Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Scientific Data, Vol. 1, Issue 1
  • DOI: 10.1038/sdata.2014.22

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space
journal, June 2015

  • Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan
  • The Journal of Physical Chemistry Letters, Vol. 6, Issue 12
  • DOI: 10.1021/acs.jpclett.5b00831

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error
journal, October 2017

  • Faber, Felix A.; Hutchison, Luke; Huang, Bing
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 11
  • DOI: 10.1021/acs.jctc.7b00577

Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
journal, November 2012

  • Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C.
  • Journal of Chemical Information and Modeling, Vol. 52, Issue 11
  • DOI: 10.1021/ci300415d

Hierarchical modeling of molecular energies using a deep neural network
journal, June 2018

  • Lubbers, Nicholas; Smith, Justin S.; Barros, Kipton
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5011181

Active learning of linearly parametrized interatomic potentials
journal, December 2017


A computational high-throughput search for new ternary superalloys
journal, January 2017


Machine Learning for Quantum Mechanical Properties of Atoms in Molecules
journal, July 2015

  • Rupp, Matthias; Ramakrishnan, Raghunathan; von Lilienfeld, O. Anatole
  • The Journal of Physical Chemistry Letters, Vol. 6, Issue 16
  • DOI: 10.1021/acs.jpclett.5b01456