Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models
Abstract
The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spacesmore »
- Authors:
-
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States). Globus
- Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Louisville, KY (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES); US Department of Commerce, NIST; National Science Foundation (NSF)
- OSTI Identifier:
- 1854527
- Grant/Contract Number:
- AC02-06CH11357; NSF-1636950; 70NANB14H012
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory
- Additional Journal Information:
- Journal Volume: 125; Journal Issue: 27; Journal ID: ISSN 1089-5639
- Publisher:
- American Chemical Society
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 36 MATERIALS SCIENCE; 42 ENGINEERING; 25 ENERGY STORAGE; 97 MATHEMATICS AND COMPUTING
Citation Formats
Ward, Logan, Dandu, Naveen, Blaiszik, Ben, Narayanan, Badri, Assary, Rajeev S., Redfern, Paul C., Foster, Ian, and Curtiss, Larry A. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. United States: N. p., 2021.
Web. doi:10.1021/acs.jpca.1c01960.
Ward, Logan, Dandu, Naveen, Blaiszik, Ben, Narayanan, Badri, Assary, Rajeev S., Redfern, Paul C., Foster, Ian, & Curtiss, Larry A. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. United States. https://doi.org/10.1021/acs.jpca.1c01960
Ward, Logan, Dandu, Naveen, Blaiszik, Ben, Narayanan, Badri, Assary, Rajeev S., Redfern, Paul C., Foster, Ian, and Curtiss, Larry A. Wed .
"Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models". United States. https://doi.org/10.1021/acs.jpca.1c01960. https://www.osti.gov/servlets/purl/1854527.
@article{osti_1854527,
title = {Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models},
author = {Ward, Logan and Dandu, Naveen and Blaiszik, Ben and Narayanan, Badri and Assary, Rajeev S. and Redfern, Paul C. and Foster, Ian and Curtiss, Larry A.},
abstractNote = {The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spaces to discover improved molecules for many applications.},
doi = {10.1021/acs.jpca.1c01960},
journal = {Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory},
number = 27,
volume = 125,
place = {United States},
year = {Wed Jun 30 00:00:00 EDT 2021},
month = {Wed Jun 30 00:00:00 EDT 2021}
}
Works referenced in this record:
Dataset: Datasets and Machine Learning Models for Accurate Estimates of Solvation Energy in Multiple Solvents
dataset, January 2021
- Ward, Logan; Dandu, Naveen; Blaiszik, Ben
- Materials Data Facility
Alchemical and structural distribution based representation for universal quantum machine learning
journal, June 2018
- Faber, Felix A.; Christensen, Anders S.; Huang, Bing
- The Journal of Chemical Physics, Vol. 148, Issue 24
970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
journal, July 2009
- Blum, Lorenz C.; Reymond, Jean-Louis
- Journal of the American Chemical Society, Vol. 131, Issue 25
High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates
journal, July 2017
- Ling, Julia; Hutchinson, Maxwell; Antono, Erin
- Integrating Materials and Manufacturing Innovation, Vol. 6, Issue 3
Retrospective on a decade of machine learning for chemical discovery
journal, September 2020
- von Lilienfeld, O. Anatole; Burke, Kieron
- Nature Communications, Vol. 11, Issue 1
ZINC 15 – Ligand Discovery for Everyone
journal, November 2015
- Sterling, Teague; Irwin, John J.
- Journal of Chemical Information and Modeling, Vol. 55, Issue 11
Efficiency of different measures for defining the applicability domain of classification models
journal, August 2017
- Klingspohn, Waldemar; Mathea, Miriam; ter Laak, Antonius
- Journal of Cheminformatics, Vol. 9, Issue 1
Hydration free energies from kernel-based machine learning: Compound-database bias
journal, July 2020
- Rauer, Clemens; Bereau, Tristan
- The Journal of Chemical Physics, Vol. 153, Issue 1
FreeSolv: a database of experimental and calculated hydration free energies, with input files
journal, June 2014
- Mobley, David L.; Guthrie, J. Peter
- Journal of Computer-Aided Molecular Design, Vol. 28, Issue 7
Extended-Connectivity Fingerprints
journal, April 2010
- Rogers, David; Hahn, Mathew
- Journal of Chemical Information and Modeling, Vol. 50, Issue 5
A data ecosystem to support machine learning in materials science
journal, October 2019
- Blaiszik, Ben; Ward, Logan; Schwarting, Marcus
- MRS Communications, Vol. 9, Issue 4
Continuum Solvation Models: What Else Can We Learn from Them?
journal, April 2010
- Mennucci, Benedetta
- The Journal of Physical Chemistry Letters, Vol. 1, Issue 10
Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties
journal, January 2012
- Le, Tu; Epa, V. Chandana; Burden, Frank R.
- Chemical Reviews, Vol. 112, Issue 5
Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions
journal, May 2009
- Marenich, Aleksandr V.; Cramer, Christopher J.; Truhlar, Donald G.
- The Journal of Physical Chemistry B, Vol. 113, Issue 18, p. 6378-6396
Message-passing neural networks for high-throughput polymer screening
journal, June 2019
- St. John, Peter C.; Phillips, Caleb; Kemper, Travis W.
- The Journal of Chemical Physics, Vol. 150, Issue 23
Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
journal, February 2019
- Lookman, Turab; Balachandran, Prasanna V.; Xue, Dezhen
- npj Computational Materials, Vol. 5, Issue 1
Quantum chemistry structures and properties of 134 kilo molecules
text, January 2014
- Raghunathan, Ramakrishnan,; O., Dral, Pavlo; Matthias, Rupp,
- Springer Nature
Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening
journal, January 2015
- Cheng, Lei; Assary, Rajeev S.; Qu, Xiaohui
- The Journal of Physical Chemistry Letters, Vol. 6, Issue 2
Prediction of Physicochemical Parameters by Atomic Contributions
journal, August 1999
- Wildman, Scott A.; Crippen, Gordon M.
- Journal of Chemical Information and Computer Sciences, Vol. 39, Issue 5
Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
journal, November 2012
- Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C.
- Journal of Chemical Information and Modeling, Vol. 52, Issue 11
A review of current developments in non-aqueous redox flow batteries: characterization of their membranes for design perspective
journal, January 2013
- Shin, Sung-Hee; Yun, Sung-Hyun; Moon, Seung-Hyeon
- RSC Advances, Vol. 3, Issue 24, p. 9095-9116
Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces
journal, April 2007
- Behler, Jörg; Parrinello, Michele
- Physical Review Letters, Vol. 98, Issue 14
Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms
journal, June 2020
- Dandu, Naveen; Ward, Logan; Assary, Rajeev S.
- The Journal of Physical Chemistry A, Vol. 124, Issue 28
Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences
journal, April 2017
- Riniker, Sereina
- Journal of Chemical Information and Modeling, Vol. 57, Issue 4
SchNet – A deep learning architecture for molecules and materials
journal, June 2018
- Schütt, K. T.; Sauceda, H. E.; Kindermans, P. -J.
- The Journal of Chemical Physics, Vol. 148, Issue 24
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
journal, April 2019
- Chen, Chi; Ye, Weike; Zuo, Yunxing
- Chemistry of Materials, Vol. 31, Issue 9
Machine learning of molecular properties: Locality and active learning
journal, June 2018
- Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.
- The Journal of Chemical Physics, Vol. 148, Issue 24
Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations
journal, August 2019
- Ward, Logan; Blaiszik, Ben; Foster, Ian
- MRS Communications, Vol. 9, Issue 3
Quantum Chemistry-Informed Active Learning to Accelerate the Design and Discovery of Sustainable Energy Storage Materials
journal, May 2020
- Doan, Hieu A.; Agarwal, Garvit; Qian, Hai
- Chemistry of Materials, Vol. 32, Issue 15
Hydration free energies from kernel-based machine learning: Compound-database bias
text, January 2020
- Rauer, Clemens; Bereau, Tristan
- arXiv
DLHub: Model and Data Serving for Science
conference, May 2019
- Chard, Ryan; Li, Zhuozhao; Chard, Kyle
- 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014
- Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
- Scientific Data, Vol. 1, Issue 1
Methods for comparing uncertainty quantifications for material property predictions
journal, May 2020
- Tran, Kevin; Neiswanger, Willie; Yoon, Junwoong
- Machine Learning: Science and Technology, Vol. 1, Issue 2
The Materials Data Facility: Data Services to Advance Materials Science Research
journal, July 2016
- Blaiszik, B.; Chard, K.; Pruyne, J.
- JOM, Vol. 68, Issue 8
Machine learning of molecular properties: Locality and active learning
journal, June 2018
- Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.
- The Journal of Chemical Physics, Vol. 148, Issue 24
Continuum Solvation Models: What Else Can We Learn from Them?
journal, April 2010
- Mennucci, Benedetta
- The Journal of Physical Chemistry Letters, Vol. 1, Issue 10
Comparison of Implicit and Explicit Solvent Models for the Calculation of Solvation Free Energy in Organic Solvents
journal, March 2017
- Zhang, Jin; Zhang, Haiyang; Wu, Tao
- Journal of Chemical Theory and Computation, Vol. 13, Issue 3
ZINC 15 – Ligand Discovery for Everyone
journal, November 2015
- Sterling, Teague; Irwin, John J.
- Journal of Chemical Information and Modeling, Vol. 55, Issue 11
Retrospective on a decade of machine learning for chemical discovery
journal, September 2020
- von Lilienfeld, O. Anatole; Burke, Kieron
- Nature Communications, Vol. 11, Issue 1
Regression Shrinkage and Selection Via the Lasso
journal, January 1996
- Tibshirani, Robert
- Journal of the Royal Statistical Society: Series B (Methodological), Vol. 58, Issue 1
Are Explicit Solvent Models More Accurate than Implicit Solvent Models? A Case Study on the Menschutkin Reaction
journal, June 2019
- Chen, Junbo; Shao, Yihan; Ho, Junming
- The Journal of Physical Chemistry A, Vol. 123, Issue 26
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads
conference, October 2021
- Saadi, Aymen Al; Alfe, Dario; Babuji, Yadu
- ICPP 2021: 50th International Conference on Parallel Processing