skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties

Journal Article · · Accounts of Chemical Research

Machine-readable chemical structure representations are foundational in all attempts to harness machine learning for the prediction of reactivities, selectivities, and chemical properties directly from molecular structure. The featurization of discrete chemical structures into a continuous vector space is a critical phase undertaken before model selection, and the development of new ways to quantitatively encode molecules is an active area of research. Here, we highlight the application and suitability of different representations, from expert-guided “engineered” descriptors to automatically “learned” features, in different prediction tasks relevant to organic and organometallic chemistry, where differing amounts of training data are available. These tasks include statistical models of stereo- and enantioselectivity, thermochemistry, and kinetics developed using experimental and quantum chemical data. The use of expert-guided molecular descriptors provides an opportunity to incorporate chemical knowledge, domain expertise, and physical constraints into statistical modeling. In applications to stereoselective organic and organometallic catalysis, where data sets may be relatively small and 3D-geometries and conformations play an important role, mechanistically informed features can be used successfully to obtain predictive statistical models that are also chemically interpretable. We provide an overview of several recent applications of this approach to obtain quantitative models for reactivity and selectivity, where topological descriptors, quantum mechanical calculations of electronic and steric properties, along with conformational ensembles, all feature as essential ingredients of the molecular representations used. Alternatively, more flexible, general-purpose molecular representations such as attributed molecular graphs can be used with machine learning approaches to learn the complex relationship between a structure and prediction target. This approach has the potential to out-perform more traditional representation methods such as “hand-crafted” molecular descriptors, particularly as data set sizes grow. One area where this is particularly relevant is in the use of large sets of quantum mechanical data to train quantitative structure–property relationships. A general approach toward curating useful data sets and training highly accurate graph neural network models is discussed in the context of organic bond dissociation enthalpies, where this strategy outperforms regression using precomputed descriptors. Finally, we describe how graph neural network predictions can be incorporated into mechanistically informed statistical models of chemical reactivity and selectivity. Once trained, this approach avoids the expensive computational overhead associated with quantum mechanical calculations, while maintaining chemical interpretability. We illustrate examples for which fast predictions of bond dissociation enthalpy and of the identities of radicals formed through cleavage of a molecule’s weakest bond are used in simple physical models of site-selectivity and reactivity.

Research Organization:
National Renewable Energy Lab. (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE Office of Energy Efficiency and Renewable Energy (EERE); National Science Foundation (NSF)
Grant/Contract Number:
AC36-08GO28308; CHE-1925607
OSTI ID:
1768320
Report Number(s):
NREL/JA-2700-79214; MainId:33440; UUID:e32eae40-d575-4047-85b6-abb218c57d0e; MainAdminID:19752
Journal Information:
Accounts of Chemical Research, Vol. 54, Issue 4; ISSN 0001-4842
Publisher:
American Chemical SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (48)

The impact of carbon–hydrogen bond dissociation energies on the prediction of the cytochrome P450 mediated major metabolic site of drug-like compounds journal October 2012
Asymmetric Hydrogen Bonding Catalysis for the Synthesis of Dihydroquinazoline-Containing Antiviral, Letermovir journal July 2017
A big data approach to the ultra-fast prediction of DFT-calculated bond energies journal July 2013
The role of computational results databases in accelerating the discovery of catalysts journal October 2018
Extended-Connectivity Fingerprints journal April 2010
Quantitative Structure−Activity Relationships of Ruthenium Catalysts for Olefin Metathesis journal May 2006
Machine Learning Approach for Prediction of Reaction Yield with Simulated Catalyst Parameters journal March 2018
SambVca 2. A Web Tool for Analyzing Catalytic Pockets with Topographic Steric Maps journal June 2016
Molecular Orbital Theory of Orientation in Aromatic, Heteroaromatic, and Other Conjugated Molecules journal August 1954
Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error journal October 2017
Combining traditional 2D and modern physical organic-derived descriptors to predict enhanced enantioselectivity for the key aza-Michael conjugate addition in the synthesis of Prevymis™ (letermovir) journal January 2018
Conformational Effects on Physical-Organic Descriptors: The Case of Sterimol Steric Parameters journal January 2019
Conformational Dynamics in Asymmetric Catalysis: Is Catalyst Flexibility a Design Element? journal January 2019
Learning to Make Chemical Predictions: The Interplay of Feature Representation, Data, and Machine Learning Methods journal July 2020
SchNet – A deep learning architecture for molecules and materials journal June 2018
Chemoinformatics:  Past, Present, and Future journal June 2006
A Priori Theoretical Prediction of Selectivity in Asymmetric Catalysis: Design of Chiral Catalysts by Using Quantum Molecular Interaction Fields journal August 2006
molSimplify: A toolkit for automating discovery in inorganic chemistry journal July 2016
A Quantitative Model for Alkane Nucleophilicity Based on C−H Bond Structural/Topological Descriptors journal February 2020
Correlating Reactivity and Selectivity to Cyclopentadienyl Ligand Properties in Rh(III)-Catalyzed C–H Activation Reactions: An Experimental and Computational Study journal January 2017
Computational Studies of Chiral Catalysts:  A Comparative Molecular Field Analysis of an Asymmetric Diels−Alder Reaction with Catalysts Containing Bisoxazoline or Phosphinooxazoline Ligands journal June 2003
Computational Ligand Descriptors for Catalyst Design journal October 2018
Machine learning and molecular descriptors enable rational solvent selection in asymmetric catalysis journal January 2019
Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts journal October 2018
Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost journal May 2020
Steric effects of phosphorus ligands in organometallic chemistry and homogeneous catalysis journal June 1977
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction journal July 2017
Predictive and mechanistic multivariate linear regression models for reaction development journal January 2018
Heptamethylindenyl (Ind*) enables diastereoselective benzamidation of cyclopropenes via Rh( iii )-catalyzed C–H activation journal January 2017
4D-QSAR: Perspectives in Drug Design journal May 2010
Formation of quaternary centres by copper catalysed asymmetric conjugate addition to β-substituted cyclopentenones with the aid of a quantitative structure–selectivity relationship journal January 2018
Parametrization of Non-covalent Interactions for Transition State Interrogation Applied to Asymmetric Catalysis journal May 2017
Enantioselective Conjugate Addition Catalyzed by a Copper Phosphoramidite Complex: Computational and Experimental Exploration of Asymmetric Induction journal September 2017
Parameterization of phosphine ligands reveals mechanistic pathways and predicts reaction outcomes journal May 2016
Parameterization of phosphine ligands demonstrates enhancement of nickel catalysis via remote steric effects journal March 2017
Analyzing Learned Molecular Representations for Property Prediction journal July 2019
Multidimensional steric parameters in the analysis of asymmetric catalytic reactions journal March 2012
Samb V ca: A Web Application for the Calculation of the Buried Volume of N-Heterocyclic Carbene Ligands journal May 2009
Quantitative Structure–Selectivity Relationships in Enantioselective Catalysis: Past, Present, and Future journal December 2019
Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules journal July 2020
Activation of Hydrogen by a Transition Metal Complex at Normal Conditions Leading to a Stable Molecular Dihydride journal February 1962
Deoxyfluorination with Sulfonyl Fluorides: Navigating Reaction Space with Machine Learning journal March 2018
Computational Methods in Developing Quantitative Structure-Activity Relationships (QSAR): A Review journal March 2006
Holistic prediction of enantioselectivity in asymmetric catalysis journal July 2019
Deep Learning in Chemistry journal May 2019
Retooling Asymmetric Conjugate Additions for Sterically Demanding Substrates with an Iterative Data-Driven Approach journal June 2019
A data-intensive approach to mechanistic elucidation applied to chiral anion catalysis journal February 2015
Computational ligand design in enantio- and diastereoselective ynamide [5+2] cycloisomerization journal January 2016