skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties

Abstract

We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no preconceived knowledge about chemical bonding, topology, or electronic orbitals. As such, it meets many important criteria for a good molecular representation, suggesting its usefulness for machine learning models of molecular properties trained across chemical compound space. To assess the performance of this new descriptor, we have trained machine learning models of molecular enthalpies of atomization for training sets with up to 10 k organic molecules, drawn at random from a published set of 134 k organic molecules with an average atomization enthalpy of over 1770 kcal/mol. We validate the descriptor on all remaining molecules of the 134 k set. For a training set of 10 k molecules, the fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol. This is slightly worse than the performance attained using the Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same training and test sets. (c) 2015 Wiley Periodicals, Inc.

Authors:
 [1];  [2];  [2];  [3]
  1. Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel, Basel Switzerland; Argonne Leadership Computing Facility, Argonne National Laboratory, 9700 S. Cass Avenue Lemont Illinois 60439
  2. Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel, Basel Switzerland
  3. Mathematics and Computer Science Division, Argonne National Laboratory, Argonne Illinois 60439; Texas Advanced Computing Center, University of Texas, Austin Texas
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
Argonne National Laboratory - Argonne Leadership Computing Facility; Swiss National Science Foundation (SNSF)
OSTI Identifier:
1392322
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article
Journal Name:
International Journal of Quantum Chemistry
Additional Journal Information:
Journal Volume: 115; Journal Issue: 16; Journal ID: ISSN 0020-7608
Publisher:
Wiley
Country of Publication:
United States
Language:
English

Citation Formats

von Lilienfeld, O. Anatole, Ramakrishnan, Raghunathan, Rupp, Matthias, and Knoll, Aaron. Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties. United States: N. p., 2015. Web. doi:10.1002/qua.24912.
von Lilienfeld, O. Anatole, Ramakrishnan, Raghunathan, Rupp, Matthias, & Knoll, Aaron. Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties. United States. doi:10.1002/qua.24912.
von Lilienfeld, O. Anatole, Ramakrishnan, Raghunathan, Rupp, Matthias, and Knoll, Aaron. Mon . "Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties". United States. doi:10.1002/qua.24912.
@article{osti_1392322,
title = {Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties},
author = {von Lilienfeld, O. Anatole and Ramakrishnan, Raghunathan and Rupp, Matthias and Knoll, Aaron},
abstractNote = {We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no preconceived knowledge about chemical bonding, topology, or electronic orbitals. As such, it meets many important criteria for a good molecular representation, suggesting its usefulness for machine learning models of molecular properties trained across chemical compound space. To assess the performance of this new descriptor, we have trained machine learning models of molecular enthalpies of atomization for training sets with up to 10 k organic molecules, drawn at random from a published set of 134 k organic molecules with an average atomization enthalpy of over 1770 kcal/mol. We validate the descriptor on all remaining molecules of the 134 k set. For a training set of 10 k molecules, the fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol. This is slightly worse than the performance attained using the Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same training and test sets. (c) 2015 Wiley Periodicals, Inc.},
doi = {10.1002/qua.24912},
journal = {International Journal of Quantum Chemistry},
issn = {0020-7608},
number = 16,
volume = 115,
place = {United States},
year = {2015},
month = {4}
}

Works referenced in this record:

A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape
journal, November 1996


Variational Particle Number Approach for Rational Compound Design
journal, October 2005

  • von Lilienfeld, O. Anatole; Lins, Roberto D.; Rothlisberger, Ursula
  • Physical Review Letters, Vol. 95, Issue 15
  • DOI: 10.1103/PhysRevLett.95.153002

Finding Density Functionals with Machine Learning
journal, June 2012


970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
journal, July 2009

  • Blum, Lorenz C.; Reymond, Jean-Louis
  • Journal of the American Chemical Society, Vol. 131, Issue 25
  • DOI: 10.1021/ja902302h

A random-sampling high dimensional model representation neural network for building potential energy surfaces
journal, August 2006

  • Manzhos, Sergei; Carrington, Tucker
  • The Journal of Chemical Physics, Vol. 125, Issue 8
  • DOI: 10.1063/1.2336223

Density Functionals with Broad Applicability in Chemistry
journal, February 2008

  • Zhao, Yan; Truhlar, Donald G.
  • Accounts of Chemical Research, Vol. 41, Issue 2
  • DOI: 10.1021/ar700111a

Two- and three-body interatomic dispersion energy contributions to binding in molecules and solids
journal, June 2010

  • Anatole von Lilienfeld, O.; Tkatchenko, Alexandre
  • The Journal of Chemical Physics, Vol. 132, Issue 23
  • DOI: 10.1063/1.3432765

Alchemical Variations of Intermolecular Energies According to Molecular Grand-Canonical Ensemble Density Functional Theory
journal, March 2007

  • von Lilienfeld, O. Anatole; Tuckerman, M. E.
  • Journal of Chemical Theory and Computation, Vol. 3, Issue 3
  • DOI: 10.1021/ct700002c

Predicting protein-protein interactions using signature products
journal, August 2004


On representing chemical environments
journal, May 2013


Homometric Structures
journal, June 1939


Atom distributions in binary atom clusters: A perturbational approach and its validation in a case study
journal, December 2004

  • Weigend, Florian; Schrodt, Claudia; Ahlrichs, Reinhart
  • The Journal of Chemical Physics, Vol. 121, Issue 21
  • DOI: 10.1063/1.1811079

Virtual Exploration of the Small-Molecule Chemical Universe below 160 Daltons
journal, February 2005

  • Fink, Tobias; Bruggesser, Heinz; Reymond, Jean-Louis
  • Angewandte Chemie International Edition, Vol. 44, Issue 10
  • DOI: 10.1002/anie.200462457

Some Relations between Reaction Rates and Equilibrium Constants.
journal, August 1935


Tuning electronic eigenvalues of benzene via doping
journal, August 2007

  • Marcon, Valentina; von Lilienfeld, O. Anatole; Andrienko, Denis
  • The Journal of Chemical Physics, Vol. 127, Issue 6
  • DOI: 10.1063/1.2752811

The high-throughput highway to computational materials design
journal, February 2013

  • Curtarolo, Stefano; Hart, Gus L. W.; Nardelli, Marco Buongiorno
  • Nature Materials, Vol. 12, Issue 3
  • DOI: 10.1038/nmat3568

Machine learning of molecular electronic properties in chemical compound space
journal, September 2013


Comment on “Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning”
journal, August 2012


Rupp et al. Reply:
journal, August 2012


Nearsightedness of electronic matter
journal, August 2005

  • Prodan, E.; Kohn, W.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 33
  • DOI: 10.1073/pnas.0505436102

Ab initio molecular dynamics: Concepts, recent developments, and future trends
journal, May 2005

  • Iftimie, R.; Minary, P.; Tuckerman, M. E.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 19
  • DOI: 10.1073/pnas.0500193102

A Homonuclear Molecule with a Permanent Electric Dipole Moment
journal, November 2011


Predicting Crystal Structures with Data Mining of Quantum Calculations
journal, September 2003


On Outliers and Activity CliffsWhy QSAR Often Disappoints
journal, July 2006

  • Maggiora, Gerald M.
  • Journal of Chemical Information and Modeling, Vol. 46, Issue 4
  • DOI: 10.1021/ci060117s

The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies
journal, May 2003

  • Faulon, Jean-Loup; Visco, Donald P.; Pophale, Ramdas S.
  • Journal of Chemical Information and Computer Sciences, Vol. 43, Issue 3
  • DOI: 10.1021/ci020345w

Alchemical derivatives of reaction energetics
journal, August 2010

  • Sheppard, Daniel; Henkelman, Graeme; von Lilienfeld, O. Anatole
  • The Journal of Chemical Physics, Vol. 133, Issue 8
  • DOI: 10.1063/1.3474502

A generalized exchange-correlation functional: the Neural-Networks approach
journal, May 2004


Bell-Evans-Polanyi principle for molecular dynamics trajectories and its implications for global optimization
journal, May 2008


Binding of hydrogen on benzene, coronene, and graphene from quantum Monte Carlo calculations
journal, April 2011

  • Ma, Jie; Michaelides, Angelos; Alfè, Dario
  • The Journal of Chemical Physics, Vol. 134, Issue 13
  • DOI: 10.1063/1.3569134

Enol Tautomers of Watson−Crick Base Pair Models Are Metastable Because of Nuclear Quantum Effects
journal, August 2010

  • Pérez, Alejandro; Tuckerman, Mark E.; Hjalmarson, Harold P.
  • Journal of the American Chemical Society, Vol. 132, Issue 33
  • DOI: 10.1021/ja102004b

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
journal, January 2012


Collective many-body van der Waals interactions in molecular systems
journal, August 2012

  • DiStasio, R. A.; von Lilienfeld, O. A.; Tkatchenko, A.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 37
  • DOI: 10.1073/pnas.1208121109

The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives
journal, January 1937

  • Hammett, Louis P.
  • Journal of the American Chemical Society, Vol. 59, Issue 1
  • DOI: 10.1021/ja01280a022

Accurate ab initio energy gradients in chemical compound space
journal, October 2009

  • Anatole von Lilienfeld, O.
  • The Journal of Chemical Physics, Vol. 131, Issue 16
  • DOI: 10.1063/1.3249969

Hopping Transport in Conductive Heterocyclic Oligomers:  Reorganization Energies and Substituent Effects
journal, February 2005

  • Hutchison, Geoffrey R.; Ratner, Mark A.; Marks, Tobin J.
  • Journal of the American Chemical Society, Vol. 127, Issue 7
  • DOI: 10.1021/ja0461421

The inverse band-structure problem of finding an atomic configuration with given electronic properties
journal, November 1999

  • Franceschetti, Alberto; Zunger, Alex
  • Nature, Vol. 402, Issue 6757
  • DOI: 10.1038/46995

Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces
journal, April 2007


Inhomogeneous Electron Gas
journal, November 1964


Toward Quantitative Structure–Property Relationships for Charge Transfer Rates of Polycyclic Aromatic Hydrocarbons
journal, July 2011

  • Misra, Milind; Andrienko, Denis; Baumeier, Björn
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 8
  • DOI: 10.1021/ct200231z

Molecular grand-canonical ensemble density functional theory and exploration of chemical space
journal, October 2006

  • von Lilienfeld, O. Anatole; Tuckerman, Mark E.
  • The Journal of Chemical Physics, Vol. 125, Issue 15
  • DOI: 10.1063/1.2338537

Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies
journal, July 2013

  • Hansen, Katja; Montavon, Grégoire; Biegler, Franziska
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 8
  • DOI: 10.1021/ct400195d

Chemical space
journal, December 2004

  • Kirkpatrick, Peter; Ellis, Clare
  • Nature, Vol. 432, Issue 7019
  • DOI: 10.1038/432823a

Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks
journal, September 2004


First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties
journal, February 2013

  • von Lilienfeld, O. Anatole
  • International Journal of Quantum Chemistry, Vol. 113, Issue 12
  • DOI: 10.1002/qua.24375

Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations
journal, January 2011

  • Behler, Jörg
  • Physical Chemistry Chemical Physics, Vol. 13, Issue 40
  • DOI: 10.1039/c1cp21668f

Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons
journal, April 2010


Towards the computational design of solid catalysts
journal, April 2009

  • Nørskov, J.; Bligaard, T.; Rossmeisl, J.
  • Nature Chemistry, Vol. 1, Issue 1, p. 37-46
  • DOI: 10.1038/nchem.121

Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Scientific Data, Vol. 1, Issue 1
  • DOI: 10.1038/sdata.2014.22

How Important is Parity Violation for Molecular and Biomolecular Chirality?
journal, December 2002


Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory
journal, June 2010

  • Hautier, Geoffroy; Fischer, Christopher C.; Jain, Anubhav
  • Chemistry of Materials, Vol. 22, Issue 12
  • DOI: 10.1021/cm100795d

Combined first-principles calculation and neural-network correction approach for heat of formation
journal, December 2003

  • Hu, LiHong; Wang, XiuJun; Wong, LaiHo
  • The Journal of Chemical Physics, Vol. 119, Issue 22
  • DOI: 10.1063/1.1630951

Potential energy surfaces for macromolecules. A neural network technique
journal, May 1992