DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models

Abstract

The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spacesmore » to discover improved molecules for many applications.« less

Authors:
ORCiD logo [1]; ORCiD logo [1];  [2];  [3]; ORCiD logo [1];  [1];  [4]; ORCiD logo [1]
  1. Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States). Globus
  3. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Louisville, KY (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES); US Department of Commerce, NIST; National Science Foundation (NSF)
OSTI Identifier:
1854527
Grant/Contract Number:  
AC02-06CH11357; NSF-1636950; 70NANB14H012
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory
Additional Journal Information:
Journal Volume: 125; Journal Issue: 27; Journal ID: ISSN 1089-5639
Publisher:
American Chemical Society
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE; 42 ENGINEERING; 25 ENERGY STORAGE; 97 MATHEMATICS AND COMPUTING

Citation Formats

Ward, Logan, Dandu, Naveen, Blaiszik, Ben, Narayanan, Badri, Assary, Rajeev S., Redfern, Paul C., Foster, Ian, and Curtiss, Larry A. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. United States: N. p., 2021. Web. doi:10.1021/acs.jpca.1c01960.
Ward, Logan, Dandu, Naveen, Blaiszik, Ben, Narayanan, Badri, Assary, Rajeev S., Redfern, Paul C., Foster, Ian, & Curtiss, Larry A. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. United States. https://doi.org/10.1021/acs.jpca.1c01960
Ward, Logan, Dandu, Naveen, Blaiszik, Ben, Narayanan, Badri, Assary, Rajeev S., Redfern, Paul C., Foster, Ian, and Curtiss, Larry A. Wed . "Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models". United States. https://doi.org/10.1021/acs.jpca.1c01960. https://www.osti.gov/servlets/purl/1854527.
@article{osti_1854527,
title = {Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models},
author = {Ward, Logan and Dandu, Naveen and Blaiszik, Ben and Narayanan, Badri and Assary, Rajeev S. and Redfern, Paul C. and Foster, Ian and Curtiss, Larry A.},
abstractNote = {The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spaces to discover improved molecules for many applications.},
doi = {10.1021/acs.jpca.1c01960},
journal = {Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory},
number = 27,
volume = 125,
place = {United States},
year = {Wed Jun 30 00:00:00 EDT 2021},
month = {Wed Jun 30 00:00:00 EDT 2021}
}

Works referenced in this record:

Dataset: Datasets and Machine Learning Models for Accurate Estimates of Solvation Energy in Multiple Solvents
dataset, January 2021

  • Ward, Logan; Dandu, Naveen; Blaiszik, Ben
  • Materials Data Facility
  • DOI: 10.18126/tvys-3xcr

Alchemical and structural distribution based representation for universal quantum machine learning
journal, June 2018

  • Faber, Felix A.; Christensen, Anders S.; Huang, Bing
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5020710

970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
journal, July 2009

  • Blum, Lorenz C.; Reymond, Jean-Louis
  • Journal of the American Chemical Society, Vol. 131, Issue 25
  • DOI: 10.1021/ja902302h

High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates
journal, July 2017

  • Ling, Julia; Hutchinson, Maxwell; Antono, Erin
  • Integrating Materials and Manufacturing Innovation, Vol. 6, Issue 3
  • DOI: 10.1007/s40192-017-0098-z

Retrospective on a decade of machine learning for chemical discovery
journal, September 2020


ZINC 15 – Ligand Discovery for Everyone
journal, November 2015

  • Sterling, Teague; Irwin, John J.
  • Journal of Chemical Information and Modeling, Vol. 55, Issue 11
  • DOI: 10.1021/acs.jcim.5b00559

Efficiency of different measures for defining the applicability domain of classification models
journal, August 2017

  • Klingspohn, Waldemar; Mathea, Miriam; ter Laak, Antonius
  • Journal of Cheminformatics, Vol. 9, Issue 1
  • DOI: 10.1186/s13321-017-0230-2

Hydration free energies from kernel-based machine learning: Compound-database bias
journal, July 2020

  • Rauer, Clemens; Bereau, Tristan
  • The Journal of Chemical Physics, Vol. 153, Issue 1
  • DOI: 10.1063/5.0012230

FreeSolv: a database of experimental and calculated hydration free energies, with input files
journal, June 2014

  • Mobley, David L.; Guthrie, J. Peter
  • Journal of Computer-Aided Molecular Design, Vol. 28, Issue 7
  • DOI: 10.1007/s10822-014-9747-x

Extended-Connectivity Fingerprints
journal, April 2010

  • Rogers, David; Hahn, Mathew
  • Journal of Chemical Information and Modeling, Vol. 50, Issue 5
  • DOI: 10.1021/ci100050t

A data ecosystem to support machine learning in materials science
journal, October 2019

  • Blaiszik, Ben; Ward, Logan; Schwarting, Marcus
  • MRS Communications, Vol. 9, Issue 4
  • DOI: 10.1557/mrc.2019.118

Continuum Solvation Models: What Else Can We Learn from Them?
journal, April 2010

  • Mennucci, Benedetta
  • The Journal of Physical Chemistry Letters, Vol. 1, Issue 10
  • DOI: 10.1021/jz100506s

Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties
journal, January 2012

  • Le, Tu; Epa, V. Chandana; Burden, Frank R.
  • Chemical Reviews, Vol. 112, Issue 5
  • DOI: 10.1021/cr200066h

Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions
journal, May 2009

  • Marenich, Aleksandr V.; Cramer, Christopher J.; Truhlar, Donald G.
  • The Journal of Physical Chemistry B, Vol. 113, Issue 18, p. 6378-6396
  • DOI: 10.1021/jp810292n

Message-passing neural networks for high-throughput polymer screening
journal, June 2019

  • St. John, Peter C.; Phillips, Caleb; Kemper, Travis W.
  • The Journal of Chemical Physics, Vol. 150, Issue 23
  • DOI: 10.1063/1.5099132

Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
journal, February 2019

  • Lookman, Turab; Balachandran, Prasanna V.; Xue, Dezhen
  • npj Computational Materials, Vol. 5, Issue 1
  • DOI: 10.1038/s41524-019-0153-8

Quantum chemistry structures and properties of 134 kilo molecules
text, January 2014


Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening
journal, January 2015

  • Cheng, Lei; Assary, Rajeev S.; Qu, Xiaohui
  • The Journal of Physical Chemistry Letters, Vol. 6, Issue 2
  • DOI: 10.1021/jz502319n

Prediction of Physicochemical Parameters by Atomic Contributions
journal, August 1999

  • Wildman, Scott A.; Crippen, Gordon M.
  • Journal of Chemical Information and Computer Sciences, Vol. 39, Issue 5
  • DOI: 10.1021/ci990307l

Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
journal, November 2012

  • Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C.
  • Journal of Chemical Information and Modeling, Vol. 52, Issue 11
  • DOI: 10.1021/ci300415d

A review of current developments in non-aqueous redox flow batteries: characterization of their membranes for design perspective
journal, January 2013

  • Shin, Sung-Hee; Yun, Sung-Hyun; Moon, Seung-Hyeon
  • RSC Advances, Vol. 3, Issue 24, p. 9095-9116
  • DOI: 10.1039/c3ra00115f

Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces
journal, April 2007


Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms
journal, June 2020

  • Dandu, Naveen; Ward, Logan; Assary, Rajeev S.
  • The Journal of Physical Chemistry A, Vol. 124, Issue 28
  • DOI: 10.1021/acs.jpca.0c01777

Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences
journal, April 2017


SchNet – A deep learning architecture for molecules and materials
journal, June 2018

  • Schütt, K. T.; Sauceda, H. E.; Kindermans, P. -J.
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5019779

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
journal, April 2019


Machine learning of molecular properties: Locality and active learning
journal, June 2018

  • Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5005095

Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations
journal, August 2019

  • Ward, Logan; Blaiszik, Ben; Foster, Ian
  • MRS Communications, Vol. 9, Issue 3
  • DOI: 10.1557/mrc.2019.107

Quantum Chemistry-Informed Active Learning to Accelerate the Design and Discovery of Sustainable Energy Storage Materials
journal, May 2020


DLHub: Model and Data Serving for Science
conference, May 2019

  • Chard, Ryan; Li, Zhuozhao; Chard, Kyle
  • 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2019.00038

Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Scientific Data, Vol. 1, Issue 1
  • DOI: 10.1038/sdata.2014.22

Methods for comparing uncertainty quantifications for material property predictions
journal, May 2020

  • Tran, Kevin; Neiswanger, Willie; Yoon, Junwoong
  • Machine Learning: Science and Technology, Vol. 1, Issue 2
  • DOI: 10.1088/2632-2153/ab7e1a

The Materials Data Facility: Data Services to Advance Materials Science Research
journal, July 2016


Machine learning of molecular properties: Locality and active learning
journal, June 2018

  • Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5005095

Continuum Solvation Models: What Else Can We Learn from Them?
journal, April 2010

  • Mennucci, Benedetta
  • The Journal of Physical Chemistry Letters, Vol. 1, Issue 10
  • DOI: 10.1021/jz100506s

Comparison of Implicit and Explicit Solvent Models for the Calculation of Solvation Free Energy in Organic Solvents
journal, March 2017

  • Zhang, Jin; Zhang, Haiyang; Wu, Tao
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 3
  • DOI: 10.1021/acs.jctc.7b00169

ZINC 15 – Ligand Discovery for Everyone
journal, November 2015

  • Sterling, Teague; Irwin, John J.
  • Journal of Chemical Information and Modeling, Vol. 55, Issue 11
  • DOI: 10.1021/acs.jcim.5b00559

Retrospective on a decade of machine learning for chemical discovery
journal, September 2020


Regression Shrinkage and Selection Via the Lasso
journal, January 1996


Are Explicit Solvent Models More Accurate than Implicit Solvent Models? A Case Study on the Menschutkin Reaction
journal, June 2019

  • Chen, Junbo; Shao, Yihan; Ho, Junming
  • The Journal of Physical Chemistry A, Vol. 123, Issue 26
  • DOI: 10.1021/acs.jpca.9b03995

IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads
conference, October 2021

  • Saadi, Aymen Al; Alfe, Dario; Babuji, Yadu
  • ICPP 2021: 50th International Conference on Parallel Processing
  • DOI: 10.1145/3472456.3473524