DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method

Abstract

Abstract We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize (a) variability of PM7 model parameter values consistent with the uncertainty in the training data and (b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate , i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. A set of parameter values for PM7 was able to capture the training data within ±1 kcal/mol, but not to the smaller level of uncertainty in the reported data. Nevertheless, PM7 was found to be consistent for subsets of the training data. In such cases, uncertainty propagation from the chemically accurate training data to the predicted values preserves error within bounds of chemical accuracy if predictions are made for the molecules of comparable size.more » Otherwise, the error grows linearly with the relative size of the molecules.« less

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Univ. of Utah, Salt Lake City, UT (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1467754
Alternate Identifier(s):
OSTI ID: 1545322
Grant/Contract Number:  
NA0002375
Resource Type:
Published Article
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Name: Scientific Reports Journal Volume: 8 Journal Issue: 1; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United Kingdom
Language:
English
Subject:
42 ENGINEERING

Citation Formats

Oreluk, James, Liu, Zhenyuan, Hegde, Arun, Li, Wenyu, Packard, Andrew, Frenklach, Michael, and Zubarev, Dmitry. Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method. United Kingdom: N. p., 2018. Web. doi:10.1038/s41598-018-31677-y.
Oreluk, James, Liu, Zhenyuan, Hegde, Arun, Li, Wenyu, Packard, Andrew, Frenklach, Michael, & Zubarev, Dmitry. Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method. United Kingdom. https://doi.org/10.1038/s41598-018-31677-y
Oreluk, James, Liu, Zhenyuan, Hegde, Arun, Li, Wenyu, Packard, Andrew, Frenklach, Michael, and Zubarev, Dmitry. Wed . "Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method". United Kingdom. https://doi.org/10.1038/s41598-018-31677-y.
@article{osti_1467754,
title = {Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method},
author = {Oreluk, James and Liu, Zhenyuan and Hegde, Arun and Li, Wenyu and Packard, Andrew and Frenklach, Michael and Zubarev, Dmitry},
abstractNote = {Abstract We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize (a) variability of PM7 model parameter values consistent with the uncertainty in the training data and (b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate , i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. A set of parameter values for PM7 was able to capture the training data within ±1 kcal/mol, but not to the smaller level of uncertainty in the reported data. Nevertheless, PM7 was found to be consistent for subsets of the training data. In such cases, uncertainty propagation from the chemically accurate training data to the predicted values preserves error within bounds of chemical accuracy if predictions are made for the molecules of comparable size. Otherwise, the error grows linearly with the relative size of the molecules.},
doi = {10.1038/s41598-018-31677-y},
journal = {Scientific Reports},
number = 1,
volume = 8,
place = {United Kingdom},
year = {Wed Sep 05 00:00:00 EDT 2018},
month = {Wed Sep 05 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1038/s41598-018-31677-y

Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: Illustration of the B2BDC methodology, specifically the intersection of feasible sets, for the problem described in Edwards et al.31. The shaded regions shown in Panels a, b and c are individual feasible set, $\mathcal{F}$e, which are a set of model parameters that can predict the ionization potential ofmore » a particular water cluster within its respective uncertainty. Panel a: individual feasible sets for dimer and trimer configurations of water, $\mathcal{F}$1 and $\mathcal{F}$2 respectively. The intersection of these two regions is the feasible set $\mathcal{F}$1:2, shown in dark red. Panel b: the addition of the feasible set for the tetramer configuration of water, $\mathcal{F}$3, shown in teal. The intersection is shown in dark red, $\mathcal{F}$1:3. Panel c: the feasible set of pentamer configuration of water, shown in dark grey, primarily overlaps with $\mathcal{F}$3. The intersection of all individual feasible sets forms $\mathcal{F}$1:4, shown in dark red. Panel d: prediction of the ionization potential of a hexamer water cluster on feasible set $\mathcal{F}$1:4. Prediction establishes a range of a prediction model as seen in Eq. 3. The resulting prediction interval for the ionization potential is shown on the right-hand side in red.« less

Save / Share:

Works referenced in this record:

Transforming data into knowledge—Process Informatics for combustion chemistry
journal, January 2007


Semiempirical quantum–chemical methods
journal, July 2013

  • Thiel, Walter
  • Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol. 4, Issue 2
  • DOI: 10.1002/wcms.1161

Polarizable Force Fields:  History, Test Cases, and Prospects
journal, September 2007

  • Warshel, Arieh; Kato, Mitsunori; Pisliakov, Andrei V.
  • Journal of Chemical Theory and Computation, Vol. 3, Issue 6
  • DOI: 10.1021/ct700127w

Quest for a universal density functional: the accuracy of density functionals across a broad spectrum of databases in chemistry and physics
journal, March 2014

  • Peverati, Roberto; Truhlar, Donald G.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 372, Issue 2011
  • DOI: 10.1098/rsta.2012.0476

Computational prediction of protein interfaces: A review of data driven methods
journal, October 2015


Uncertainty quantification: Making predictions of complex reaction systems reliable
journal, October 2010


The Elements of Statistical Learning
book, January 2009


Comparison of Statistical and Deterministic Frameworks of Uncertainty Quantification
journal, January 2016

  • Frenklach, Michael; Packard, Andrew; Garcia-Donato, Gonzalo
  • SIAM/ASA Journal on Uncertainty Quantification, Vol. 4, Issue 1
  • DOI: 10.1137/15M1019131

Towards cleaner combustion engines through groundbreaking detailed chemical kinetic models
journal, January 2011

  • Battin-Leclerc, Frédérique; Blurock, Edward; Bounaceur, Roda
  • Chemical Society Reviews, Vol. 40, Issue 9
  • DOI: 10.1039/c0cs00207k

Uncertainty quantification in thermochemistry, benchmarking electronic structure computations, and Active Thermochemical Tables
journal, January 2014

  • Ruscic, Branko
  • International Journal of Quantum Chemistry, Vol. 114, Issue 17
  • DOI: 10.1002/qua.24605

Deep learning for computational chemistry
journal, March 2017

  • Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav
  • Journal of Computational Chemistry, Vol. 38, Issue 16
  • DOI: 10.1002/jcc.24764

Semiempirical Quantum-Chemical Orthogonalization-Corrected Methods: Theory, Implementation, and Parameters
journal, January 2016

  • Dral, Pavlo O.; Wu, Xin; Spörkel, Lasse
  • Journal of Chemical Theory and Computation, Vol. 12, Issue 3
  • DOI: 10.1021/acs.jctc.5b01046

Semiempirical Quantum-Chemical Orthogonalization-Corrected Methods: Benchmarks for Ground-State Properties
journal, January 2016

  • Dral, Pavlo O.; Wu, Xin; Spörkel, Lasse
  • Journal of Chemical Theory and Computation, Vol. 12, Issue 3
  • DOI: 10.1021/acs.jctc.5b01047

Machine-learned approximations to Density Functional Theory Hamiltonians
journal, February 2017

  • Hegde, Ganesh; Bowen, R. Chris
  • Scientific Reports, Vol. 7, Issue 1
  • DOI: 10.1038/srep42669

Prediction uncertainty from models and data
conference, January 2002

  • Frenklach, M.; Packard, A.; Seiler, P.
  • Proceedings of 2002 American Control Conference, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301)
  • DOI: 10.1109/ACC.2002.1024578

The Effects of Computational Modeling Errors on the Estimation of Statistical Mechanical Variables
journal, March 2012

  • Faver, John C.; Yang, Wei; Merz, Kenneth M.
  • Journal of Chemical Theory and Computation, Vol. 8, Issue 10
  • DOI: 10.1021/ct300024z

QSAR Modeling: Where Have You Been? Where Are You Going To?
journal, January 2014

  • Cherkasov, Artem; Muratov, Eugene N.; Fourches, Denis
  • Journal of Medicinal Chemistry, Vol. 57, Issue 12
  • DOI: 10.1021/jm4004285

Consistency of a Reaction Dataset
journal, November 2004

  • Feeley, Ryan; Seiler, Pete; Packard, Andrew
  • The Journal of Physical Chemistry A, Vol. 108, Issue 44
  • DOI: 10.1021/jp047524w

Density functional theory is straying from the path toward the exact functional
journal, January 2017

  • Medvedev, Michael G.; Bushmarinov, Ivan S.; Sun, Jianwei
  • Science, Vol. 355, Issue 6320
  • DOI: 10.1126/science.aah5975

Comparison of Molecular Mechanics, Semi-Empirical Quantum Mechanical, and Density Functional Theory Methods for Scoring Protein–Ligand Interactions
journal, June 2013

  • Yilmazer, Nusret Duygu; Korth, Martin
  • The Journal of Physical Chemistry B, Vol. 117, Issue 27
  • DOI: 10.1021/jp402719k

Quantum-chemical insights from deep tensor neural networks
journal, January 2017

  • Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan
  • Nature Communications, Vol. 8, Issue 1
  • DOI: 10.1038/ncomms13890

Hybrid Density Functional Methods Empirically Optimized for the Computation of 13 C and 1 H Chemical Shifts in Chloroform Solution
journal, May 2006

  • Wiitala, Keith W.; Hoye, Thomas R.; Cramer, Christopher J.
  • Journal of Chemical Theory and Computation, Vol. 2, Issue 4
  • DOI: 10.1021/ct6001016

Improving the accuracy of Møller-Plesset perturbation theory with neural networks
journal, October 2017

  • McGibbon, Robert T.; Taube, Andrew G.; Donchev, Alexander G.
  • The Journal of Chemical Physics, Vol. 147, Issue 16
  • DOI: 10.1063/1.4986081

Numerical approaches for collaborative data processing
journal, December 2006

  • Seiler, Pete; Frenklach, Michael; Packard, Andrew
  • Optimization and Engineering, Vol. 7, Issue 4
  • DOI: 10.1007/s11081-006-0350-4

Atomic Radius and Charge Parameter Uncertainty in Biomolecular Solvation Energy Calculations
journal, January 2018

  • Yang, Xiu; Lei, Huan; Gao, Peiyuan
  • Journal of Chemical Theory and Computation, Vol. 14, Issue 2
  • DOI: 10.1021/acs.jctc.7b00905

Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules
journal, July 2013

  • Lusci, Alessandro; Pollastri, Gianluca; Baldi, Pierre
  • Journal of Chemical Information and Modeling, Vol. 53, Issue 7
  • DOI: 10.1021/ci400187y

Error Assessment of Computational Models in Chemistry
journal, April 2017

  • Simm, GregorN.; Proppe, Jonny; Reiher, Markus
  • CHIMIA International Journal for Chemistry, Vol. 71, Issue 4
  • DOI: 10.2533/chimia.2017.202

An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications
journal, January 2016


Design of Density Functionals by Combining the Method of Constraint Satisfaction with Parametrization for Thermochemistry, Thermochemical Kinetics, and Noncovalent Interactions
journal, January 2006

  • Zhao, Yan; Schultz, Nathan E.; Truhlar, Donald G.
  • Journal of Chemical Theory and Computation, Vol. 2, Issue 2
  • DOI: 10.1021/ct0502763

Perspective on density functional theory
journal, April 2012

  • Burke, Kieron
  • The Journal of Chemical Physics, Vol. 136, Issue 15
  • DOI: 10.1063/1.4704546

Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications
journal, April 2016


Additivity rules for the estimation of thermochemical properties
journal, June 1969

  • Benson, Sidney W.; Cruickshank, F. R.; Golden, D. M.
  • Chemical Reviews, Vol. 69, Issue 3
  • DOI: 10.1021/cr60259a002

Chemical Kinetics and Combustion Modeling
journal, October 1990


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.