Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Representations and strategies for transferable machine learning improve model performance in chemical discovery

Journal Article · · Journal of Chemical Physics
DOI:https://doi.org/10.1063/5.0082964· OSTI ID:1979037
 [1];  [2];  [3];  [2];  [2];  [2]
  1. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); Univ. of Minnesota, Minneapolis, MN (United States)
  2. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States)
  3. Department of Chemical Engineering, Massachusetts Institute of Technology 1 , Cambridge, Massachusetts 02139, USA

Strategies for machine-learning (ML)-accelerated discovery that are general across material composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets such as open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (~1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the Periodic Table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to the graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the group number alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data are limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the Periodic Table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the Periodic Table, a property we expect to be broadly useful for other material domains.

Research Organization:
University of Minnesota, Minneapolis, MN (United States)
Sponsoring Organization:
USDOE Office of Science (SC); Office of Naval Research (ONR); Defense Advanced Research Projects Agency (DARPA); National Science Foundation (NSF); Burroughs Wellcome Fund; AAAS Marion Milligan Mason Award
Grant/Contract Number:
SC0012702
OSTI ID:
1979037
Alternate ID(s):
OSTI ID: 1845171
Journal Information:
Journal of Chemical Physics, Journal Name: Journal of Chemical Physics Journal Issue: 7 Vol. 156; ISSN 0021-9606
Publisher:
American Institute of Physics (AIP)Copyright Statement
Country of Publication:
United States
Language:
English

References (99)

Topological Mapping of Bidentate Ligands: A Fast Approach for Screening Homogeneous Catalysts journal December 2005
Direct Prediction of Phonon Density of States With Euclidean Neural Networks journal March 2021
Machine learning for heterogeneous catalyst design and discovery journal May 2018
Quantum Machine Learning in Chemical Compound Space journal March 2018
Divergent Coupling of Alcohols and Amines Catalyzed by Isoelectronic Hydride Mn I and Fe II PNP Pincer Complexes journal July 2016
molSimplify: A toolkit for automating discovery in inorganic chemistry journal July 2016
Inverse quantum chemistry: Concepts and strategies for rational compound design journal April 2014
A ?Level-Shifting? method for converging closed shell Hartree-Fock wave functions journal July 1973
TeraChem : A graphical processing unit ‐accelerated electronic structure package for large‐scale ab initio molecular dynamics journal July 2020
Interactive-quantum-chemical-descriptors enabling accurate prediction of an activation energy through machine learning journal August 2020
Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design journal January 2021
The Genesis of Molecular Volcano Plots journal February 2021
Getting Down to Earth: The Renaissance of Catalysis with Abundant Metals journal August 2015
Identification Schemes for Metal–Organic Frameworks To Enable Rapid Search and Cheminformatics Analysis journal September 2019
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals journal April 2019
Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities journal October 2018
Computational Ligand Descriptors for Catalyst Design journal October 2018
Search for Catalysts by Inverse Design: Artificial Intelligence, Mountain Climbers, and Alchemists journal October 2018
Leveraging Cheminformatics Strategies for Inorganic Discovery: Application to Redox Potential Design journal April 2017
Strategies and Software for Machine Learning Accelerated Discovery in Transition Metal Chemistry journal September 2018
The Distinctive Electronic Structures of Rhenium Tris(thiolate) Complexes, an Unexpected Contrast to the Valence Isoelectronic Ruthenium Tris(thiolate) Complexes journal December 2016
Inverse Design of a Catalyst for Aqueous CO/CO 2 Conversion Informed by the Ni II –Iminothiolate Complex journal November 2018
Designing in the Face of Uncertainty: Exploiting Electronic Structure and Machine Learning Models for Discovery in Inorganic Chemistry journal March 2019
Transferable Multilevel Attention Neural Network for Accurate Prediction of Quantum Chemistry Properties via Multitask Learning journal February 2021
Deep Learning in Chemistry journal May 2019
Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics journal June 2020
Heuristics-Guided Exploration of Reaction Mechanisms journal November 2015
Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis journal July 2018
Learning from Failure: Predicting Electronic Structure Calculation Outcomes with Machine Learning Models journal March 2019
Seeing Is Believing: Experimental Spin States from Machine Learning Model Structure Predictions journal March 2020
Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure–Property Relationships journal November 2017
Nonempirical Definition of the Mendeleev Numbers: Organizing the Chemical Space journal October 2020
Semi-supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost journal July 2020
Rapid Detection of Strong Correlation with Machine Learning for Transition-Metal Complex High-Throughput Screening journal August 2020
Group and Period-Based Representations for Improved Machine Learning Prediction of Heterogeneous Alloy Catalysts journal May 2021
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space journal June 2015
Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network journal February 2018
Quantum Chemistry in the Age of Machine Learning journal March 2020
Data-Driven Advancement of Homogeneous Nickel Catalyst Activity for Aryl Ether Cleavage journal May 2020
Reversing the Tradeoff between Rate and Overpotential in Molecular Electrocatalysts for H 2 Production journal March 2018
Machine Learning Accelerates the Discovery of Design Rules and Exceptions in Stable Metal–Oxo Intermediate Formation journal July 2019
Automated in Silico Design of Homogeneous Catalysts journal January 2020
Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization journal March 2020
Transferable Machine-Learning Model of the Electron Density journal December 2018
Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields journal November 1994
Inverse Design and Synthesis of acac-Coumarin Anchors for Robust TiO 2 Sensitization journal June 2011
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
Through-Space Charge Interaction Substituent Effects in Molecular Catalysis Leading to the Design of the Most Efficient Catalyst of CO 2 -to-CO Electrochemical Conversion journal December 2016
Activity Descriptors Derived from Comparison of Mo and Fe as Active Metal for Methane Conversion to Aromatics journal November 2019
A Universal Machine Learning Algorithm for Large-Scale Screening of Materials journal February 2020
Atomic Property Weighted Radial Distribution Functions Descriptors of Metal–Organic Frameworks for the Prediction of Gas Uptake Capacity journal July 2013
Random Forests journal January 2001
A molecular ruthenium catalyst with water-oxidation activity comparable to that of photosystem II journal March 2012
The high-throughput highway to computational materials design journal February 2013
Representation of molecular structures with persistent homology for machine learning applications in chemistry journal June 2020
Understanding the diversity of the metal-organic framework ecosystem journal August 2020
Efficient search of compositional space for hybrid organic–inorganic perovskites via Bayesian optimization journal September 2018
Machine learning for molecular and materials science journal July 2018
Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution journal September 2018
Inverse design of nanoporous crystalline reticular materials with deep generative models journal January 2021
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
Slow magnetization dynamics in a series of two-coordinate iron( ii ) complexes journal January 2013
Bio-inspired noble metal-free nanomaterials approaching platinum performances for H 2 evolution and uptake journal January 2016
Catalytic (de)hydrogenation promoted by non-precious metals – Co, Fe and Mn: recent advances in an emerging field journal January 2018
Predicting electronic structure properties of transition metal complexes with neural networks journal January 2017
Machine learning meets volcano plots: computational discovery of cross-coupling catalysts journal January 2018
Machine learning material properties from the periodic table using convolutional neural networks journal January 2018
Enumeration of de novo inorganic complexes for chemical discovery and machine learning journal January 2020
A quantitative uncertainty metric controls error in neural network-driven chemical discovery journal January 2019
Hammett neural networks: prediction of frontier orbital energies of tungsten–benzylidyne photoredox complexes journal January 2019
Large-scale comparison of 3d and 4d transition metal complexes illuminates the reduced effect of exchange on second-row spin-state energetics journal January 2020
Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex journal January 2020
Quantum-mechanical transition-state model combined with machine learning provides catalyst design features for selective Cr olefin oligomerization journal January 2020
Catalyst design in C–H activation: a case study in the use of binding free energies to rationalise intramolecular directing group selectivity in iridium catalysis journal January 2021
Self‐Consistent Molecular‐Orbital Methods. IX. An Extended Gaussian‐Type Basis for Molecular‐Orbital Studies of Organic Molecules journal January 1971
Ab initio effective core potentials for molecular calculations. Potentials for the transition metal atoms Sc to Hg journal January 1985
Density‐functional thermochemistry. III. The role of exact exchange journal April 1993
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Geometry optimization made simple with translation and rotation coordinates journal June 2016
Alchemical and structural distribution based representation for universal quantum machine learning journal June 2018
FCHL revisited: Faster and more accurate quantum machine learning journal January 2020
TeraChem: Accelerating electronic structure and ab initio molecular dynamics with graphical processing units journal June 2020
Predicting oxidation and spin states by high-dimensional neural networks: Applications to lithium manganese oxide spinels journal October 2020
Improved accuracy and transferability of molecular-orbital-based machine learning: Organics, transition-metal complexes, non-covalent interactions, and transition states journal February 2021
On representing chemical environments journal May 2013
Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density journal January 1988
Combinatorial screening for new materials in unconstrained composition space with machine learning journal March 2014
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning journal January 2012
Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces journal April 2007
Regression Shrinkage and Selection Via the Lasso journal January 1996
Machine learning unifies the modeling of materials and molecules journal December 2017
Combining scaling relationships overcomes rate versus overpotential trade-offs in O 2 molecular electrocatalysis journal March 2020
From Hydrogenases to Noble Metal-Free Catalytic Nanomaterials for H2 Production and Uptake journal December 2009
Amine(imine)diphosphine Iron Catalysts for Asymmetric Transfer Hydrogenation of Ketones and Imines journal November 2013
Cobalt-catalyzed asymmetric hydrogenation of enamides enabled by single-electron reduction journal May 2018
A linear cobalt(II) complex with maximal orbital angular momentum from a non-Aufbau ground state journal November 2018
Using nature’s blueprint to expand catalysis with Earth-abundant metals journal August 2020
Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit journal March 2008
Open Babel: An open chemical toolbox journal October 2011