DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Emin: A First-Principles Thermochemical Descriptor for Predicting Molecular Synthesizability

Journal Article · · Journal of Chemical Information and Modeling

Predicting the synthesizability of a new molecule remains an unsolved challenge that chemists have long tackled with heuristic approaches. Here, in this study, we report a new method for predicting synthesizability using a simple yet accurate thermochemical descriptor. We introduce Emin, the energy difference between a molecule and its lowest energy constitutional isomer, as a synthesizability predictor that is accurate, physically meaningful, and first-principles based. We apply Emin to 134,000 molecules in the QM9 data set and find that Emin is accurate when used alone and reduces incorrect predictions of "synthesizable" by up to 52% when used to augment commonly used prediction methods. Our work illustrates how first-principles thermochemistry and heuristic approximations for molecular stability are complementary, opening a new direction for synthesizability prediction methods.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
2475465
Journal Information:
Journal of Chemical Information and Modeling, Journal Name: Journal of Chemical Information and Modeling Journal Issue: 4 Vol. 64; ISSN 1549-9596
Publisher:
American Chemical SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (59)

Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 journal April 1996
The Fluorine Gauche Effect: A Brief History journal September 2016
NBO 7.0 : New vistas in localized and delocalized chemical bonding theory journal June 2019
The MolSSI QCA rchive project: An open‐source platform to compute, organize, and share quantum chemistry data journal July 2020
Extended tight‐binding quantum chemistry methods journal August 2020
Automated theoretical chemical kinetics: Predicting the kinetics for the initial stages of pyrolysis journal January 2021
A DFT study on the origin of the fluorine gauche effect in substituted fluoroethanes journal March 2010
Catalytic Enantioselective Ring-Opening Reactions of Cyclopropanes journal May 2020
The Synthesizability of Molecules Proposed by Generative Models journal April 2020
Materials Precursor Score: Modeling Chemists’ Intuition for the Synthetic Accessibility of Porous Organic Cage Precursors journal August 2021
An Additive Definition of Molecular Complexity journal February 2016
SCScore: Synthetic Complexity Learned from a Reaction Corpus journal January 2018
Spacial Score─A Comprehensive Topological Indicator for Small-Molecule Complexity journal August 2023
Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms journal June 2020
Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset journal November 2020
What Types of Noncovalent Bonds Stabilize Dimers (XCP)2, for X = CN, Cl, F, and H? journal November 2019
Similarity concepts for the planning of organic reactions and syntheses journal November 1992
A New and Simple Approach to Chemical Complexity. Application to the Synthesis of Natural Products journal February 2001
Molecular complexity: a simplified formula adapted to individual atoms journal May 1987
Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition journal May 2003
Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 journal November 2012
Modeling a Crowdsourced Definition of Molecular Complexity journal May 2014
Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation journal February 2009
A survey of Hammett substituent constants and resonance and field parameters journal March 1991
The first general index of molecular complexity journal June 1981
On the Structure of Total Synthesis of Complex Natural Products journal October 1998
Gene Selection for Cancer Classification using Support Vector Machines journal January 2002
Prediction and accelerated laboratory discovery of previously unknown 18-electron ABX compounds journal March 2015
Aziridine synthesis by coupling amines and alkenes via an electrogenerated dication journal June 2021
MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods journal November 2023
Machine learned synthesizability predictions aided by density functional theory journal October 2022
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
Computational and experimental investigation of TmAgTe 2 and XYZ 2 compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening journal January 2015
Tetrel, pnictogen and chalcogen bonds identified in the gas phase before they had names: a systematic look at non-covalent interactions journal January 2017
Retrosynthetic accessibility score (RAscore) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning journal January 2021
Recent advancements in rational design of non-aqueous organic redox flow batteries journal January 2020
14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon journal January 2023
Uncovering novel liquid organic hydrogen carriers: a systematic exploration of chemical compound space using cheminformatics and quantum chemical methods journal January 2023
Computer-aided organic synthesis journal January 2005
Accurate quantum chemical energies for 133 000 organic molecules journal January 2019
Gaussian-4 theory journal February 2007
Gaussian-4 theory using reduced order perturbation theory journal September 2007
Ring Opening Reactions of Epoxides. A Review journal November 2021
The Weak Hydrogen Bond book January 2001
PubChem in 2021: new data content and improved web interfaces journal November 2020
Hydrogen Bonding book September 1997
The thermodynamic scale of inorganic crystalline metastability journal November 2016
Empirical Explorations of SYNCHEM journal September 1977
Parsl: Pervasive Parallel Programming in Python
  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325400
conference January 2019
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions journal June 2009
Mordred: a molecular descriptor calculator journal February 2018
Dataset’s chemical diversity limits the generalizability of machine learning predictions journal November 2019
SYBA: Bayesian estimation of synthetic accessibility of organic compounds journal May 2020
AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning journal November 2020
Surge: a fast open-source chemical graph generator journal April 2022
Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations journal August 2019
Multicenter (FX)n/NH3 Halogen Bonds (X = Cl, Br and n = 1–5). QTAIM Descriptors of the Strength of the X∙∙∙N Interaction journal November 2017
Scikit-learn: Machine Learning in Python text January 2012
A Unified Approach to Interpreting Model Predictions preprint January 2017