Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
Abstract
Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. Here we introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of HartreeFock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.
- Authors:
-
- University of Basel (Switzerland)
- Max-Planck-Institut für Kohlenforschung, Mülheim an der Ruhr (Germany); Friedrich-Alexander University Erlangen-Nuremberg, Bamberg (Germany)
- University of Basel (Switzerland); Argonne National Laboratory (ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF)
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); Swiss National Science Foundation (SNSF)
- OSTI Identifier:
- 1392925
- Grant/Contract Number:
- AC02-06CH11357; PP00P2_138932
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Chemical Theory and Computation
- Additional Journal Information:
- Journal Volume: 11; Journal Issue: 5; Journal ID: ISSN 1549-9618
- Publisher:
- American Chemical Society
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY
Citation Formats
Ramakrishnan, Raghunathan, Dral, Pavlo O., Rupp, Matthias, and von Lilienfeld, O. Anatole. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. United States: N. p., 2015.
Web. doi:10.1021/acs.jctc.5b00099.
Ramakrishnan, Raghunathan, Dral, Pavlo O., Rupp, Matthias, & von Lilienfeld, O. Anatole. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. United States. https://doi.org/10.1021/acs.jctc.5b00099
Ramakrishnan, Raghunathan, Dral, Pavlo O., Rupp, Matthias, and von Lilienfeld, O. Anatole. Fri .
"Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach". United States. https://doi.org/10.1021/acs.jctc.5b00099. https://www.osti.gov/servlets/purl/1392925.
@article{osti_1392925,
title = {Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach},
author = {Ramakrishnan, Raghunathan and Dral, Pavlo O. and Rupp, Matthias and von Lilienfeld, O. Anatole},
abstractNote = {Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. Here we introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of HartreeFock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.},
doi = {10.1021/acs.jctc.5b00099},
journal = {Journal of Chemical Theory and Computation},
number = 5,
volume = 11,
place = {United States},
year = {Fri Apr 10 00:00:00 EDT 2015},
month = {Fri Apr 10 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds
journal, May 2013
- Virshup, Aaron M.; Contreras-García, Julia; Wipf, Peter
- Journal of the American Chemical Society, Vol. 135, Issue 19
First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties
journal, February 2013
- von Lilienfeld, O. Anatole
- International Journal of Quantum Chemistry, Vol. 113, Issue 12
Combinatorial thinking in chemistry and biology
journal, April 1997
- Ellman, J.; Stoddard, B.; Wells, J.
- Proceedings of the National Academy of Sciences, Vol. 94, Issue 7
Click Chemistry: Diverse Chemical Function from a Few Good Reactions
journal, June 2001
- Kolb, Hartmuth C.; Finn, M. G.; Sharpless, K. Barry
- Angewandte Chemie International Edition, Vol. 40, Issue 11, p. 2004-2021
Towards the computational design of solid catalysts
journal, April 2009
- Nørskov, J.; Bligaard, T.; Rossmeisl, J.
- Nature Chemistry, Vol. 1, Issue 1, p. 37-46
The Many Roles of Computation in Drug Discovery
journal, March 2004
- Jorgensen, W. L.
- Science, Vol. 303, Issue 5665
De novodesign: balancing novelty and confined chemical space
journal, June 2010
- Kutchukian, Peter S.; Shakhnovich, Eugene I.
- Expert Opinion on Drug Discovery, Vol. 5, Issue 8
The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid
journal, August 2011
- Hachmann, Johannes; Olivares-Amaya, Roberto; Atahan-Evrenk, Sule
- The Journal of Physical Chemistry Letters, Vol. 2, Issue 17
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013
- Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
- APL Materials, Vol. 1, Issue 1
Inverse Strategies for Molecular Design
journal, January 1996
- Kuhn, Christoph; Beratan, David N.
- The Journal of Physical Chemistry, Vol. 100, Issue 25
The inverse band-structure problem of finding an atomic configuration with given electronic properties
journal, November 1999
- Franceschetti, Alberto; Zunger, Alex
- Nature, Vol. 402, Issue 6757
Variational Particle Number Approach for Rational Compound Design
journal, October 2005
- von Lilienfeld, O. Anatole; Lins, Roberto D.; Rothlisberger, Ursula
- Physical Review Letters, Vol. 95, Issue 15
Designing Molecules by Optimizing Potentials
journal, March 2006
- Wang, Mingliang; Hu, Xiangqian; Beratan, David N.
- Journal of the American Chemical Society, Vol. 128, Issue 10
A comprehensive chemical kinetic combustion model for the four butanol isomers
journal, June 2012
- Sarathy, S. Mani; Vranckx, Stijn; Yasunaga, Kenji
- Combustion and Flame, Vol. 159, Issue 6
Combustion and pyrolysis of iso-butanol: Experimental and chemical kinetic modeling study
journal, October 2013
- Merchant, Shamel S.; Zanoelo, Everton Fernando; Speth, Raymond L.
- Combustion and Flame, Vol. 160, Issue 10
Chemical Kinetic Data Base for Combustion Chemistry. Part I. Methane and Related Compounds
journal, July 1986
- Tsang, W.; Hampson, R. F.
- Journal of Physical and Chemical Reference Data, Vol. 15, Issue 3
Ab initio quantum chemistry: Methodology and applications
journal, May 2005
- Friesner, R. A.
- Proceedings of the National Academy of Sciences, Vol. 102, Issue 19
Gaussian‐1 theory: A general procedure for prediction of molecular energies
journal, May 1989
- Pople, John A.; Head‐Gordon, Martin; Fox, Douglas J.
- The Journal of Chemical Physics, Vol. 90, Issue 10
Gaussian‐2 theory for molecular energies of first‐ and second‐row compounds
journal, June 1991
- Curtiss, Larry A.; Raghavachari, Krishnan; Trucks, Gary W.
- The Journal of Chemical Physics, Vol. 94, Issue 11
Gaussian-4 theory
journal, February 2007
- Curtiss, Larry A.; Redfern, Paul C.; Raghavachari, Krishnan
- The Journal of Chemical Physics, Vol. 126, Issue 8
Gaussian-4 theory using reduced order perturbation theory
journal, September 2007
- Curtiss, Larry A.; Redfern, Paul C.; Raghavachari, Krishnan
- The Journal of Chemical Physics, Vol. 127, Issue 12
Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014
- Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
- Scientific Data, Vol. 1, Issue 1
Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies
journal, August 2009
- Balabin, Roman M.; Lomakina, Ekaterina I.
- The Journal of Chemical Physics, Vol. 131, Issue 7
Combined first-principles calculation and neural-network correction approach for heat of formation
journal, December 2003
- Hu, LiHong; Wang, XiuJun; Wong, LaiHo
- The Journal of Chemical Physics, Vol. 119, Issue 22
First-principles energetics of water clusters and ice: A many-body analysis
journal, December 2013
- Gillan, M. J.; Alfè, D.; Bartók, A. P.
- The Journal of Chemical Physics, Vol. 139, Issue 24
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
journal, January 2012
- Rupp, Matthias; Tkatchenko, Alexandre; Müller, Klaus-Robert
- Physical Review Letters, Vol. 108, Issue 5
Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies
journal, July 2013
- Hansen, Katja; Montavon, Grégoire; Biegler, Franziska
- Journal of Chemical Theory and Computation, Vol. 9, Issue 8
Machine learning of molecular electronic properties in chemical compound space
journal, September 2013
- Montavon, Grégoire; Rupp, Matthias; Gobre, Vivekanand
- New Journal of Physics, Vol. 15, Issue 9
Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
journal, November 2012
- Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C.
- Journal of Chemical Information and Modeling, Vol. 52, Issue 11
Challenges for Density Functional Theory
journal, December 2011
- Cohen, Aron J.; Mori-Sánchez, Paula; Yang, Weitao
- Chemical Reviews, Vol. 112, Issue 1
Orthogonalization corrections for semiempirical methods
journal, April 2000
- Weber, Wolfgang; Thiel, Walter
- Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta), Vol. 103, Issue 6
Discovering chemistry with an ab initio nanoreactor
journal, November 2014
- Wang, Lee-Ping; Titov, Alexey; McGibbon, Robert
- Nature Chemistry, Vol. 6, Issue 12
Nearsightedness of electronic matter
journal, August 2005
- Prodan, E.; Kohn, W.
- Proceedings of the National Academy of Sciences, Vol. 102, Issue 33
Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties
journal, September 1998
- Elstner, M.; Porezag, D.; Jungnickel, G.
- Physical Review B, Vol. 58, Issue 11, p. 7260-7268
Functional designed to include surface effects in self-consistent density functional theory
journal, August 2005
- Armiento, R.; Mattsson, A. E.
- Physical Review B, Vol. 72, Issue 8
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
journal, February 1988
- Weininger, David
- Journal of Chemical Information and Modeling, Vol. 28, Issue 1
Open Babel: An open chemical toolbox
journal, October 2011
- O'Boyle, Noel M.; Banck, Michael; James, Craig A.
- Journal of Cheminformatics, Vol. 3, Issue 1
From atoms and bonds to three-dimensional atomic coordinates: automatic model builders
journal, November 1993
- Sadowski, Jens.; Gasteiger, Johann.
- Chemical Reviews, Vol. 93, Issue 7
Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters
journal, November 2012
- Stewart, James J. P.
- Journal of Molecular Modeling, Vol. 19, Issue 1
Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields
journal, November 1994
- Stephens, P. J.; Devlin, F. J.; Chabalowski, C. F.
- The Journal of Physical Chemistry, Vol. 98, Issue 45, p. 11623-11627
First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties
journal, February 2013
- von Lilienfeld, O. Anatole
- International Journal of Quantum Chemistry, Vol. 113, Issue 12
A comprehensive chemical kinetic combustion model for the four butanol isomers
journal, June 2012
- Sarathy, S. Mani; Vranckx, Stijn; Yasunaga, Kenji
- Combustion and Flame, Vol. 159, Issue 6
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
journal, February 1988
- Weininger, David
- Journal of Chemical Information and Modeling, Vol. 28, Issue 1
From atoms and bonds to three-dimensional atomic coordinates: automatic model builders
journal, November 1993
- Sadowski, Jens.; Gasteiger, Johann.
- Chemical Reviews, Vol. 93, Issue 7
Inverse Strategies for Molecular Design
journal, January 1996
- Kuhn, Christoph; Beratan, David N.
- The Journal of Physical Chemistry, Vol. 100, Issue 25
Gaussian-4 theory
journal, February 2007
- Curtiss, Larry A.; Redfern, Paul C.; Raghavachari, Krishnan
- The Journal of Chemical Physics, Vol. 126, Issue 8
Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies
journal, August 2009
- Balabin, Roman M.; Lomakina, Ekaterina I.
- The Journal of Chemical Physics, Vol. 131, Issue 7
Ab initio quantum chemistry: Methodology and applications
journal, May 2005
- Friesner, R. A.
- Proceedings of the National Academy of Sciences, Vol. 102, Issue 19
Data-mined similarity function between material compositions
journal, December 2013
- Yang, Lusann; Ceder, Gerbrand
- Physical Review B, Vol. 88, Issue 22
Quantum chemistry structures and properties of 134 kilo molecules
text, January 2014
- Raghunathan, Ramakrishnan,; O., Dral, Pavlo; Matthias, Rupp,
- Springer Nature
Works referencing / citing this record:
Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities
journal, October 2018
- Vogiatzis, Konstantinos D.; Polynski, Mikhail V.; Kirkland, Justin K.
- Chemical Reviews, Vol. 119, Issue 4
Boosting Quantum Machine Learning Models with a Multilevel Combination Technique: Pople Diagrams Revisited
journal, December 2018
- Zaspel, Peter; Huang, Bing; Harbrecht, Helmut
- Journal of Chemical Theory and Computation, Vol. 15, Issue 3
Self-Parametrizing System-Focused Atomistic Models
journal, January 2020
- Brunken, Christoph; Reiher, Markus
- Journal of Chemical Theory and Computation, Vol. 16, Issue 3
Regression Clustering for Improved Accuracy and Training Costs with Molecular-Orbital-Based Machine Learning
journal, October 2019
- Cheng, Lixue; Kovachki, Nikola B.; Welborn, Matthew
- Journal of Chemical Theory and Computation, Vol. 15, Issue 12
Breaking the Coupled Cluster Barrier for Machine-Learned Potentials of Large Molecules: The Case of 15-Atom Acetylacetone
journal, May 2021
- Qu, Chen; Houston, Paul L.; Conte, Riccardo
- The Journal of Physical Chemistry Letters, Vol. 12, Issue 20
Learning a Local-Variable Model of Aromatic and Conjugated Systems
journal, December 2017
- Matlock, Matthew K.; Dang, Na Le; Swamidass, S. Joshua
- ACS Central Science, Vol. 4, Issue 1
Read between the Molecules: Computational Insights into Organic Semiconductors
journal, November 2018
- Gryn’ova, Ganna; Lin, Kun-Han; Corminboeuf, Clémence
- Journal of the American Chemical Society, Vol. 140, Issue 48
Quantum-chemical insights from deep tensor neural networks
journal, January 2017
- Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan
- Nature Communications, Vol. 8, Issue 1
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning
journal, July 2019
- Smith, Justin S.; Nebgen, Benjamin T.; Zubatyuk, Roman
- Nature Communications, Vol. 10, Issue 1
Energy refinement and analysis of structures in the QM9 database via a highly accurate quantum chemical method
journal, July 2019
- Kim, Hyungjun; Park, Ji Young; Choi, Sunghwan
- Scientific Data, Vol. 6, Issue 1
Energy-free machine learning force field for aluminum
journal, August 2017
- Kruglov, Ivan; Sergeev, Oleg; Yanilkin, Alexey
- Scientific Reports, Vol. 7, Issue 1
Deep elastic strain engineering of bandgap through machine learning
journal, February 2019
- Shi, Zhe; Tsymbalov, Evgenii; Dao, Ming
- Proceedings of the National Academy of Sciences, Vol. 116, Issue 10
A mixed quantum chemistry/machine learning approach for the fast and accurate prediction of biochemical redox potentials and its large-scale application to 315,000 redox reactions
journal, April 2019
- Jinich, Adrian; Sanchez-Lengeling, Benjamin; Ren, Haniu
- ACS Central Science
Machine learning unifies the modeling of materials and molecules
journal, December 2017
- Bartók, Albert P.; De, Sandip; Poelking, Carl
- Science Advances, Vol. 3, Issue 12
Electronic structure at coarse-grained resolutions from supervised machine learning
journal, March 2019
- Jackson, Nicholas E.; Bowen, Alec S.; Antony, Lucas W.
- Science Advances, Vol. 5, Issue 3
Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery
journal, November 2019
- Nagarajan, Nagasundaram; Yapp, Edward K. Y.; Le, Nguyen Quoc Khanh
- BioMed Research International, Vol. 2019
Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns
journal, November 2019
- Cova, Tânia F. G. G.; Pais, Alberto A. C. C.
- Frontiers in Chemistry, Vol. 7
Modelling Chemical Reasoning to Predict Reactions
text, January 2016
- Segler, Marwin H. S.; Waller, Mark P.
- arXiv
Steering Orbital Optimization out of Local Minima and Saddle Points Toward Lower Energy
text, January 2017
- Vaucher, Alain C.; Reiher, Markus
- arXiv
A Density Functional Tight Binding Layer for Deep Learning of Chemical Hamiltonians
preprint, January 2018
- Li, Haichen; Collins, Christopher; Tanha, Matteus
- arXiv
Resolution limit of data-driven coarse-grained models spanning chemical space
text, January 2019
- Kanekal, Kiran H.; Bereau, Tristan
- arXiv
Inverse Design of Potential Singlet Fission Molecules using a Transfer Learning Based Approach
preprint, January 2020
- Subramanian, Akshay; Saha, Utkarsh; Sharma, Tejasvini
- arXiv
Basis set convergence and extrapolation of connected triple excitation contributions (T) in computational thermochemistry: the W4-17 benchmark with up to k functions
text, January 2021
- Martin, Jan M. L.
- arXiv