DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry

Abstract

Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. Finally, we found that a model originally trained on hydrocarbons and oxygenatesmore » can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Dept. of Chemical Engineering
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1530407
Grant/Contract Number:  
AC02- 05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory
Additional Journal Information:
Journal Volume: 123; Journal Issue: 10; Journal ID: ISSN 1089-5639
Publisher:
American Chemical Society
Country of Publication:
United States
Language:
English
Subject:
37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY

Citation Formats

Li, Yi-Pei, Han, Kehang, Grambow, Colin A., and Green, William H. Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry. United States: N. p., 2019. Web. doi:10.1021/acs.jpca.8b10789.
Li, Yi-Pei, Han, Kehang, Grambow, Colin A., & Green, William H. Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry. United States. https://doi.org/10.1021/acs.jpca.8b10789
Li, Yi-Pei, Han, Kehang, Grambow, Colin A., and Green, William H. Wed . "Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry". United States. https://doi.org/10.1021/acs.jpca.8b10789. https://www.osti.gov/servlets/purl/1530407.
@article{osti_1530407,
title = {Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry},
author = {Li, Yi-Pei and Han, Kehang and Grambow, Colin A. and Green, William H.},
abstractNote = {Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. Finally, we found that a model originally trained on hydrocarbons and oxygenates can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains.},
doi = {10.1021/acs.jpca.8b10789},
journal = {Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory},
number = 10,
volume = 123,
place = {United States},
year = {Wed Feb 13 00:00:00 EST 2019},
month = {Wed Feb 13 00:00:00 EST 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 39 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Quantum Mechanical Modeling of Catalytic Processes
journal, July 2011


Supramolecular Binding Thermodynamics by Dispersion-Corrected Density Functional Theory
journal, July 2012


Analysis of the Reaction Mechanism and Catalytic Activity of Metal-Substituted Beta Zeolite for the Isomerization of Glucose to Fructose
journal, April 2014

  • Li, Yi-Pei; Head-Gordon, Martin; Bell, Alexis T.
  • ACS Catalysis, Vol. 4, Issue 5
  • DOI: 10.1021/cs401054f

Computational Study of p -Xylene Synthesis from Ethylene and 2,5-Dimethylfuran Catalyzed by H-BEA
journal, September 2014

  • Li, Yi-Pei; Head-Gordon, Martin; Bell, Alexis T.
  • The Journal of Physical Chemistry C, Vol. 118, Issue 38
  • DOI: 10.1021/jp506664c

HEAT: High accuracy extrapolated ab initio thermochemistry
journal, December 2004

  • Tajti, Attila; Szalay, Péter G.; Császár, Attila G.
  • The Journal of Chemical Physics, Vol. 121, Issue 23
  • DOI: 10.1063/1.1811608

W4 theory for computational thermochemistry: In pursuit of confident sub-kJ/mol predictions
journal, October 2006

  • Karton, Amir; Rabinovich, Elena; Martin, Jan M. L.
  • The Journal of Chemical Physics, Vol. 125, Issue 14
  • DOI: 10.1063/1.2348881

W3 theory: Robust computational thermochemistry in the kJ/mol accuracy range
journal, March 2004

  • Boese, A. Daniel; Oren, Mikhal; Atasoylu, Onur
  • The Journal of Chemical Physics, Vol. 120, Issue 9
  • DOI: 10.1063/1.1638736

Towards standard methods for benchmark quality ab initio thermochemistry—W1 and W2 theory
journal, August 1999

  • Martin, Jan M. L.; de Oliveira, Glênisson
  • The Journal of Chemical Physics, Vol. 111, Issue 5
  • DOI: 10.1063/1.479454

Further benchmarks of a composite, convergent, statistically calibrated coupled-cluster-based approach for thermochemical and spectroscopic studies
journal, April 2012


Chemical accuracy in ab initio thermochemistry and spectroscopy: current strategies and future challenges
journal, January 2012

  • Peterson, Kirk A.; Feller, David; Dixon, David A.
  • Theoretical Chemistry Accounts, Vol. 131, Issue 1
  • DOI: 10.1007/s00214-011-1079-5

An expanded calibration study of the explicitly correlated CCSD(T)-F12b method using large basis set standard CCSD(T) atomization energies
journal, August 2013

  • Feller, David; Peterson, Kirk A.
  • The Journal of Chemical Physics, Vol. 139, Issue 8
  • DOI: 10.1063/1.4819125

In pursuit of the ab initio limit for conformational energy prototypes
journal, June 1998

  • Császár, Attila G.; Allen, Wesley D.; Schaefer, Henry F.
  • The Journal of Chemical Physics, Vol. 108, Issue 23
  • DOI: 10.1063/1.476449

Gaussian-4 theory
journal, February 2007

  • Curtiss, Larry A.; Redfern, Paul C.; Raghavachari, Krishnan
  • The Journal of Chemical Physics, Vol. 126, Issue 8
  • DOI: 10.1063/1.2436888

Automated computational thermochemistry for butane oxidation: A prelude to predictive automated combustion kinetics
journal, January 2019

  • Keçeli, Murat; Elliott, Sarah N.; Li, Yi-Pei
  • Proceedings of the Combustion Institute, Vol. 37, Issue 1
  • DOI: 10.1016/j.proci.2018.07.113

Thermodynamics of Anharmonic Systems: Uncoupled Mode Approximations for Molecules
journal, May 2016

  • Li, Yi-Pei; Bell, Alexis T.; Head-Gordon, Martin
  • Journal of Chemical Theory and Computation, Vol. 12, Issue 6
  • DOI: 10.1021/acs.jctc.5b01177

Additivity rules for the estimation of thermochemical properties
journal, June 1969

  • Benson, Sidney W.; Cruickshank, F. R.; Golden, D. M.
  • Chemical Reviews, Vol. 69, Issue 3
  • DOI: 10.1021/cr60259a002

Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms
journal, June 2016

  • Gao, Connie W.; Allen, Joshua W.; Green, William H.
  • Computer Physics Communications, Vol. 203
  • DOI: 10.1016/j.cpc.2016.02.013

THERM: Thermodynamic property estimation for gas phase radicals and molecules
journal, September 1991

  • Ritter, Edward R.; Bozzelli, Joseph W.
  • International Journal of Chemical Kinetics, Vol. 23, Issue 9
  • DOI: 10.1002/kin.550230903

Thermodynamic Parameters and Group Additivity Ring Corrections for Three- to Six-Membered Oxygen Heterocyclic Hydrocarbons
journal, March 1997

  • Lay, Tsan H.; Yamada, Takahiro; Tsai, Po-Lun
  • The Journal of Physical Chemistry A, Vol. 101, Issue 13
  • DOI: 10.1021/jp9629497

An Extended Group Additivity Method for Polycyclic Thermochemistry Estimation: AN EXTENDED GROUP ADDITIVITY METHOD FOR POLYCYCLIC THERMOCHEMISTRY ESTIMATION
journal, February 2018

  • Han, Kehang; Jamal, Adeel; Grambow, Colin A.
  • International Journal of Chemical Kinetics, Vol. 50, Issue 4
  • DOI: 10.1002/kin.21158

An adaptive distance-based group contribution method for thermodynamic property prediction
journal, January 2016

  • He, Tanjin; Li, Shuang; Chi, Yawei
  • Physical Chemistry Chemical Physics, Vol. 18, Issue 34
  • DOI: 10.1039/C6CP02929A

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction
journal, July 2017

  • Coley, Connor W.; Barzilay, Regina; Green, William H.
  • Journal of Chemical Information and Modeling, Vol. 57, Issue 8
  • DOI: 10.1021/acs.jcim.6b00601

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
journal, January 2012


Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces
journal, April 2007


Amp: A modular approach to machine learning in atomistic simulations
journal, October 2016


Extended-Connectivity Fingerprints
journal, April 2010

  • Rogers, David; Hahn, Mathew
  • Journal of Chemical Information and Modeling, Vol. 50, Issue 5
  • DOI: 10.1021/ci100050t

Molecular graph convolutions: moving beyond fingerprints
journal, August 2016

  • Kearnes, Steven; McCloskey, Kevin; Berndl, Marc
  • Journal of Computer-Aided Molecular Design, Vol. 30, Issue 8
  • DOI: 10.1007/s10822-016-9938-8

Systematic Error Estimation for Chemical Reaction Energies
journal, May 2016

  • Simm, Gregor N.; Reiher, Markus
  • Journal of Chemical Theory and Computation, Vol. 12, Issue 6
  • DOI: 10.1021/acs.jctc.6b00318

Error-Controlled Exploration of Chemical Reaction Networks with Gaussian Processes
journal, August 2018

  • Simm, Gregor N.; Reiher, Markus
  • Journal of Chemical Theory and Computation, Vol. 14, Issue 10
  • DOI: 10.1021/acs.jctc.8b00504

An Introduction to the Bootstrap
book, May 1994

  • Efron, Bradley; Tibshirani, R. J.
  • Monographs on Statistics and Applied Probability
  • DOI: 10.1201/9780429246593

Reliable Estimation of Prediction Uncertainty for Physicochemical Property Models
journal, June 2017

  • Proppe, Jonny; Reiher, Markus
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 7
  • DOI: 10.1021/acs.jctc.7b00235

Active Learning
journal, June 2012


Addressing uncertainty in atomistic machine learning
journal, January 2017

  • Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza
  • Physical Chemistry Chemical Physics, Vol. 19, Issue 18
  • DOI: 10.1039/C7CP00375G

Less is more: Sampling chemical space with active learning
journal, June 2018

  • Smith, Justin S.; Nebgen, Ben; Lubbers, Nicholas
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5023802

Bagging predictors
journal, August 1996


Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Scientific Data, Vol. 1, Issue 1
  • DOI: 10.1038/sdata.2014.22

Challenges for Density Functional Theory
journal, December 2011

  • Cohen, Aron J.; Mori-Sánchez, Paula; Yang, Weitao
  • Chemical Reviews, Vol. 112, Issue 1
  • DOI: 10.1021/cr200107z

Uncertainty quantification for quantum chemical models of complex reaction networks
journal, January 2016

  • Proppe, Jonny; Husch, Tamara; Simm, Gregor N.
  • Faraday Discussions, Vol. 195
  • DOI: 10.1039/C6FD00144K

Context-Driven Exploration of Complex Chemical Reaction Networks
journal, November 2017

  • Simm, Gregor N.; Reiher, Markus
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 12
  • DOI: 10.1021/acs.jctc.7b00945

Advances in molecular quantum chemistry contained in the Q-Chem 4 program package
journal, September 2014


Uncertainty quantification in thermochemistry, benchmarking electronic structure computations, and Active Thermochemical Tables
journal, January 2014

  • Ruscic, Branko
  • International Journal of Quantum Chemistry, Vol. 114, Issue 17
  • DOI: 10.1002/qua.24605

A Hybrid Human-computer Approach to the Extraction of Scientific Facts from the Literature
journal, January 2016


ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016

  • Swain, Matthew C.; Cole, Jacqueline M.
  • Journal of Chemical Information and Modeling, Vol. 56, Issue 10
  • DOI: 10.1021/acs.jcim.6b00207

ChemicalTagger: A tool for semantic text-mining in chemistry
journal, May 2011

  • Hawizy, Lezan; Jessop, David M.; Adams, Nico
  • Journal of Cheminformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1758-2946-3-17

ChemSpot: a hybrid system for chemical named entity recognition
journal, April 2012


Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning
journal, October 2017


Using natural language processing techniques to inform research on nanotechnology
journal, January 2015

  • Lewinski, Nastassja A.; McInnes, Bridget T.
  • Beilstein Journal of Nanotechnology, Vol. 6
  • DOI: 10.3762/bjnano.6.149

ChemOS: Orchestrating autonomous experimentation
journal, June 2018


Networking chemical robots for reaction multitasking
journal, August 2018


New Scale Factors for Harmonic Vibrational Frequencies Using the B3LYP Density Functional Method with the Triple-ζ Basis Set 6-311+G(d,p)
journal, March 2005

  • Andersson, M. P.; Uvdal, P.
  • The Journal of Physical Chemistry A, Vol. 109, Issue 12
  • DOI: 10.1021/jp045733a

Molpro: a general-purpose quantum chemistry program package: Molpro
journal, July 2011

  • Werner, Hans-Joachim; Knowles, Peter J.; Knizia, Gerald
  • Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol. 2, Issue 2
  • DOI: 10.1002/wcms.82

A simple and efficient CCSD(T)-F12 approximation
journal, December 2007

  • Adler, Thomas B.; Knizia, Gerald; Werner, Hans-Joachim
  • The Journal of Chemical Physics, Vol. 127, Issue 22
  • DOI: 10.1063/1.2817618

Works referencing / citing this record: