DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Expansion of bond dissociation prediction with machine learning to medicinally and environmentally relevant chemical space

Journal Article · · Digital Discovery
DOI: https://doi.org/10.1039/d3dd00169e · OSTI ID:2203761

Bond dissociation energetics underpin the thermodynamics of chemical transformations where bonds are broken or formed and can also be used to predict reaction rates and selectivities. Current machine learning (ML) models to predict bond dissociation energy (BDE) are largely limited in their elemental coverage to hydrogen and the second-row elements. This has restricted the applicability of ML-derived BDE predictions, particularly for molecules of medicinal relevance, since the heteroatoms S, Cl, F, P, Br, and I are commonly found in approved pharmaceuticals. Atmospherically and environmentally relevant molecules containing multiple halogen atoms have been similarly inaccessible. In this study, we considerably expand the size, elemental composition, and bond types of an extensive BDE database and train a new ML BDE model that includes C, H, N, O, S, Cl, F, P, Br, and I. We curate a new quantum chemical dataset of 531 244 unique zero-point energy inclusive homolytic dissociations of organic compounds. We investigate accuracy for out-of-sample molecules and implement iterative training and testing cycles during model development to improve the model accuracy. Improvements in predictive accuracy were achieved for datasets of pharmaceutically relevant molecules containing multiple C(sp2)–halogen bonds from 5.7 to 0.8 kcal mol-1 and polyhaloalkyl compounds with multiple C(sp3)–halogen bonds from 2.7 to 1.2 kcal mol-1 through the targeted augmentation of training data by as little as eight additional molecules. Our updated and expanded model (ALFABET) achieves a mean absolute error of 0.6 kcal mol-1 for both enthalpies and free energies compared to the quantum chemical ground truth. The graph-based representations utilized here outperform traditional cheminformatics features such as radial fingerprints, and there is no discernible improvement in accuracy by including more expensive QM-derived parameters, such as optimized bond lengths. Finally, we illustrate high accuracy in external prediction tasks for large halogenated natural products, pharmaceutically relevant halogenated molecules, atmospherically important halocarbons, and polyfluoroalkyl substances related to environmental toxicity.

Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE; National Science Foundation (NSF)
Grant/Contract Number:
AC36-08GO28308; CHE–2202693; 2201538
OSTI ID:
2203761
Alternate ID(s):
OSTI ID: 2279169
Report Number(s):
NREL/JA-2700-88470; MainId:89249; UUID:b4f8df09-0031-4732-9e03-debc76630278; MainAdminID:71473
Journal Information:
Digital Discovery, Vol. 2, Issue 6; ISSN 2635-098X
Publisher:
Royal Society of ChemistryCopyright Statement
Country of Publication:
United States
Language:
English

References (46)

Automated assignment of high-resolution collisionally activated dissociation mass spectra using a systematic bond disconnection approach journal January 2005
U1 snRNP regulates cancer cell migration and invasion in vitro journal January 2020
The Determination of Bond Dissociation Energies by Pyrolytic Methods. journal August 1950
A quantitative uncertainty metric controls error in neural network-driven chemical discovery journal January 2019
Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals journal April 2017
BSE49, a diverse, high-quality benchmark dataset of separation energies of chemical bonds journal November 2021
Biochemistry, Cellular and Molecular Biology, and Physiological Roles of the Iodothyronine Selenodeiodinases journal February 2002
Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost journal May 2020
A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry journal December 2021
A big data approach to the ultra-fast prediction of DFT-calculated bond energies journal July 2013
Theoretical Bond Dissociation Energies of Halo-Heterocycles: Trends and Relationships to Regioselectivity in Palladium-Catalyzed Cross-Coupling Reactions journal April 2009
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation journal November 2015
Chemical degradation of Nafion ionomer at a catalyst interface of polymer electrolyte fuel cell by hydrogen and oxygen feeding in the anode journal January 2013
Computational Study of Bond Dissociation Enthalpies for a Large Range of Native and Modified Lignins journal October 2011
Using dissociation energies to predict observability of b- and y-peaks in mass spectra of short peptides journal March 2012
Fast and automated identification of reactions with low barriers: the decomposition of 3-hydroperoxypropanal journal October 2021
Quick Building Blocks (QBB): An Innovative and Efficient Business Model To Speed Medicinal Chemistry Analog Synthesis journal July 2019
Chlorinated Natural Products and Related Halogenases journal March 2019
Halogen Bonds: Benchmarks and Theoretical Analysis journal March 2013
A Machine Learning Approach for Predicting Defluorination of Per- and Polyfluoroalkyl Substances (PFAS) for Their Efficient Treatment and Removal journal August 2019
Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries journal August 2022
Emerging Building Blocks for Medicinal Chemistry: Recent Synthetic Advances journal November 2021
A computational study of CX (X = H, C, F, Cl) bond dissociation enthalpies (BDEs) in polyhalogenated methanes and ethanes: POLYHALOGENATED METHANES AND ETHANES journal December 2010
Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules journal July 2020
An Accurate QSPR Study of O−H Bond Dissociation Energy in Substituted Phenols Based on Support Vector Machines journal March 2004
Halogen bonds with benzene: An assessment of DFT functionals journal December 2013
Benchmark calculations for bond dissociation energies and enthalpy of formation of chlorinated and brominated polycyclic aromatic hydrocarbons journal January 2021
Tyrian Purple: 6,6’-Dibromoindigo and Related Compounds journal August 2001
Fragment-based lead discovery: leads by design journal July 2005
BDE-db: A collection of 290,664 Homolytic Bond Dissociation Enthalpies for Small Organic Molecules dataset January 2019
Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties journal February 2021
ZINC: A Free Tool to Discover Chemistry for Biology journal June 2012
Benchmarking DFT methods with small basis sets for the calculation of halogen-bond strengths journal February 2017
Bond Dissociation Energies of Organic Molecules journal April 2003
A Bond-Energy/Bond-Order and Populations Relationship journal July 2022
Beyond C, H, O, and N! Analysis of the Elemental Composition of U.S. FDA Approved Drug Architectures: Miniperspective journal October 2014
How Well Can New-Generation Density Functionals Describe the Energetics of Bond-Dissociation Reactions Producing Radicals? journal February 2008
Snakes on the Rungs of Jacob’s Ladder: Anomalous Vibrational Spectra from Double-Hybrid DFT Methods journal July 2020
A complete basis set model chemistry. VI. Use of density functional geometries and frequencies journal February 1999
Using Machine Learning to Predict the Dissociation Energy of Organic Carbonyls journal April 2020
Price-Focused Analysis of Commercially Available Building Blocks for Combinatorial Library Synthesis journal September 2015
Natural production of fluorinated compounds and biotechnological prospects of the fluorinase enzyme journal January 2017
Calculating bond dissociation energies of X−H (X=C, N, O, S) bonds of aromatic systems via density functional theory: a detailed comparison of methods journal June 2022
Quantum-Chemical Predictions of Redox Potentials of Organic Anions in Dimethyl Sulfoxide and Reevaluation of Bond Dissociation Enthalpies Measured by the Electrochemical Methods journal May 2006
A quantitative metric for organic radical stability and persistence using thermodynamic and kinetic features journal January 2021
BDE-db: A collection of 290,664 Homolytic Bond Dissociation Enthalpies for Small Organic Molecules dataset January 2019