DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Strategies and Software for Machine Learning Accelerated Discovery in Transition Metal Chemistry

Journal Article · · Industrial and Engineering Chemistry Research
 [1];  [2]; ORCiD logo [2];  [3]; ORCiD logo [2]
  1. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); DOE/OSTI
  2. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States)
  3. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); Eidgenoessische Technische Hochschule (ETH), Zurich (Switzerland)

Machine learning the electronic structure of open shell transition metal complexes presents unique challenges, including robust and automated data set generation. In this report we introduce tools that simplify data acquisition from density functional theory (DFT) and validation of trained machine learning models using the molSimplify automatic design (mAD) workflow. We demonstrate this workflow by training and comparing the performance of LASSO, kernel ridge regression (KRR), and artificial neural network (ANN) models using heuristic, topological revised autocorrelation (RAC) descriptors we have recently introduced for machine learning inorganic chemistry. On a series of open shell transition metal complexes, we evaluate set aside test errors of these models for predicting the HOMO level and HOMO–LUMO gap. The best performing models are ANNs, which show 0.15 and 0.25 eV test set mean absolute errors on the HOMO level and HOMO–LUMO gap, respectively. Poor performing KRR models using the full 153-feature RAC set are improved to nearly the same performance as the ANNs when trained on down-selected subsets of 20–30 features. Analysis of the essential descriptors for HOMO level and HOMO–LUMO gap prediction as well as comparison to subsets previously obtained for other properties reveal the paramount importance of nonlocal, steric properties in determining frontier molecular orbital energetics. We demonstrate our model performance on diverse complexes and in the discovery of molecules with target HOMO–LUMO gaps from a large 15,000 molecule design space in minutes rather than days that full DFT evaluation would require.

Research Organization:
Massachusetts Institute of Technology (MIT), Cambridge, MA (United States)
Sponsoring Organization:
Defense Advanced Research Projects Agency (DARPA); National Science Foundation (NSF); Office of Naval Research (ONR); USDOE Office of Science (SC)
Grant/Contract Number:
SC0018096
OSTI ID:
1612842
Journal Information:
Industrial and Engineering Chemistry Research, Journal Name: Industrial and Engineering Chemistry Research Journal Issue: 42 Vol. 57; ISSN 0888-5885
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (93)

Machine learning for heterogeneous catalyst design and discovery journal May 2018
The Catalyst Genome journal December 2012
Switching of Molecular Spin States in Inorganic Complexes by Temperature, Pressure, Magnetic Field and Light: Towards Molecular Devices: Switching of Molecular Spin States in Inorganic Complexes journal November 2004
molSimplify: A toolkit for automating discovery in inorganic chemistry journal July 2016
A Shape Index from Molecular Graphs journal January 1985
A ?Level-Shifting? method for converging closed shell Hartree-Fock wave functions journal July 1973
The Elements of Statistical Learning book January 2009
Light-Induced Spin Crossover and the High-Spin→Low-Spin Relaxation book July 2004
Using Gas-Phase Clusters to Screen Porphyrin-Supported Nanocluster Catalysts for Ethane Oxidation to Ethanol journal October 2016
Light-induced excited spin state trapping in a transition-metal complex: The hexa-1-propyltetrazole-iron (II) tetrafluoroborate spin-crossover system journal March 1984
The ligand field molecular mechanics model and the stereoelectronic effects of d and s electrons journal February 2001
Understanding the reactivity of transition metal complexes involving multiple spin states journal March 2003
The role of transition metal complexes in dye sensitized solar devices journal May 2013
Ironing out the photochemical and spin-crossover behavior of Fe(II) coordination compounds with computational chemistry journal April 2017
AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations journal June 2012
AFLOW: An automatic framework for high-throughput materials discovery journal June 2012
Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis journal February 2013
The AFLOW standard for high-throughput materials science calculations journal October 2015
Multi-fidelity machine learning models for accurate bandgap predictions of solids journal March 2017
Computational Discovery of Hydrogen Bond Design Rules for Electrochemical Ion Separation journal August 2016
Harnessing Organic Ligand Libraries for First-Principles Inorganic Discovery: Indium Phosphide Quantum Dot Precursor Design Strategies journal April 2017
Leveraging Cheminformatics Strategies for Inorganic Discovery: Application to Redox Potential Design journal April 2017
Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations journal February 2018
General Approach to Estimate Error Bars for Quantitative Structure–Activity Relationship Predictions of Molecular Activity journal June 2018
Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach journal April 2015
How Much Can Density Functional Approximations (DFA) Fail? The Extreme Case of the FeO 4 Species journal March 2016
Spin Propensities of Octahedral Complexes From Density Functional Theory journal April 2015
Ligand-Field-Dependent Behavior of Meta-GGA Exchange in Transition-Metal Complex Spin-State Ordering journal October 2016
Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure–Property Relationships journal November 2017
Machine-Learning Energy Gaps of Porphyrins with Molecular Graph Representations journal April 2018
When Is Ligand p K a a Good Descriptor for Catalyst Energetics? In Search of Optimal CO 2 Hydration Catalysts journal April 2018
Computational Investigation and Design of Cobalt Aqua Complexes for Homogeneous Water Oxidation journal April 2016
Enhanced Cooperativity in Supported Spin-Crossover Metal–Organic Frameworks journal July 2017
Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network journal February 2018
Understanding and Breaking Scaling Relations in Single-Site Catalysis: Methane to Methanol Conversion by Fe IV ═O journal January 2018
Molecular Design of Photovoltaic Materials for Polymer Solar Cells: Toward Suitable Electronic Energy Levels and Broad Absorption journal January 2012
Frontier molecular orbital theory of cycloaddition reactions journal November 1975
ZINC − A Free Database of Commercially Available Compounds for Virtual Screening journal December 2004
Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics journal August 2009
Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields journal November 1994
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
Guest Tunable Structure and Spin Crossover Properties in a Nanoporous Coordination Framework Material journal September 2009
Reversible CO Scavenging via Adsorbate-Dependent Spin State Transitions in an Iron(II)–Triazolate Metal–Organic Framework journal April 2016
Comparison of DFT Methods for Molecular Orbital Eigenvalue Calculations journal March 2007
The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid journal August 2011
Efficient Computational Screening of Organic Polymer Photovoltaics journal April 2013
Random Forests journal January 2001
Electronics using hybrid-molecular and mono-molecular devices journal November 2000
The high-throughput highway to computational materials design journal February 2013
Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach journal August 2016
Machine learning in materials informatics: recent applications and prospects journal December 2017
Machine learning in catalysis journal April 2018
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules journal December 2017
ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost journal January 2017
Addressing uncertainty in atomistic machine learning journal January 2017
Predicting electronic structure properties of transition metal complexes with neural networks journal January 2017
MoleculeNet: a benchmark for molecular machine learning journal January 2018
Machine learning for the structure–energy–property landscapes of molecular crystals journal January 2018
The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics journal January 2018
A Molecular Orbital Theory of Reactivity in Aromatic Hydrocarbons journal April 1952
Comparison of density functionals for differences between the high- (T2g5) and low- (A1g1) spin states of iron(II) compounds. IV. Results for the ferrous complexes [Fe(L)(‘NHS4’)] journal June 2005
Ab initio effective core potentials for molecular calculations. Potentials for the transition metal atoms Sc to Hg journal January 1985
Density‐functional thermochemistry. III. The role of exact exchange journal April 1993
Assessment of density functional theory for iron(II) molecules across the spin-crossover transition journal September 2012
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Simulated evolution of fluorophores for light emitting diodes journal March 2015
Towards quantifying the role of exact exchange in predictions of transition metal complex properties journal July 2015
Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity journal October 2016
Perspective: Machine learning potentials for atomistic simulations journal November 2016
SchNet – A deep learning architecture for molecules and materials journal June 2018
Machine learning-based screening of complex molecules for polymer solar cells journal June 2018
Less is more: Sampling chemical space with active learning journal June 2018
Understanding band gaps of solids in generalized Kohn–Sham theory journal March 2017
Density functional theory for modelling large molecular adsorbate–surface interactions: a mini-review and worked example journal November 2016
Prediction of Partition Coefficients (LOGPoct) Using Autocorrelation Descriptors journal December 1997
The atomic simulation environment—a Python library for working with atoms journal June 2017
The ChEMBL database in 2017 journal November 2016
Proof that ∂ E ∂ n i = ε in density-functional theory journal December 1978
Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density journal January 1988
Fractional charge perspective on the band gap in density-functional theory journal March 2008
On representing chemical environments journal May 2013
Localization and Delocalization Errors in Density Functional Theory and Implications for Band-Gap Prediction journal April 2008
Fundamental Gaps in Finite Systems from Eigenvalues of a Generalized Kohn-Sham Method journal December 2010
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning journal January 2012
Big Data of Materials Science: Critical Role of the Descriptor journal March 2015
Physical Content of the Exact Kohn-Sham Orbital Energies: Band Gaps and Derivative Discontinuities journal November 1983
Density-Functional Theory of the Energy Gap journal November 1983
Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces journal April 2007
Orbital-dependent density functionals: Theory and applications journal January 2008
A solution for the best rotation to relate two sets of vectors journal September 1976
Open Babel: An open chemical toolbox journal October 2011
Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms conference January 2013