DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reliable and explainable machine-learning methods for accelerated material discovery

Journal Article · · npj Computational Materials

Abstract Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.

Sponsoring Organization:
USDOE
OSTI ID:
1619662
Journal Information:
npj Computational Materials, Journal Name: npj Computational Materials Journal Issue: 1 Vol. 5; ISSN 2057-3960
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (38)

Best Practices for QSAR Model Development, Validation, and Exploitation journal July 2010
Crystal structure representations for machine learning models of formation energies journal April 2015
Learning from imbalanced data: open challenges and future directions journal April 2016
Artificial neural network aided design of catalyst for propane ammoxidation journal November 1997
Informatics-aided bandgap engineering for solar materials journal February 2014
Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods journal December 2010
Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry journal June 2012
QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem journal February 2014
Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties journal January 2012
Data Mined Ionic Substitutions for the Discovery of New Compounds journal January 2011
QSAR Modeling: Where Have You Been? Where Are You Going To? journal January 2014
Accelerated search for materials with targeted properties by adaptive design journal April 2016
Predicting crystal structure by merging data mining with quantum mechanics journal July 2006
The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies journal December 2015
A general-purpose machine learning framework for predicting properties of inorganic materials journal August 2016
High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites journal October 2017
Accelerating materials property predictions using machine learning journal September 2013
Machine learning bandgaps of double perovskites journal January 2016
Learning physical descriptors for materials science by compressed sensing journal February 2017
Data-mined similarity function between material compositions journal December 2013
Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids journal February 2014
Combinatorial screening for new materials in unconstrained composition space with machine learning journal March 2014
How to represent crystal structures for machine learning: Towards fast prediction of electronic properties journal May 2014
Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression journal February 2016
Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons journal April 2010
Big Data of Materials Science: Critical Role of the Descriptor journal March 2015
Predicting Crystal Structures with Data Mining of Quantum Calculations journal September 2003
New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design journal May 2002
Interpretable classification models for recidivism prediction
  • Zeng, Jiaming; Ustun, Berk; Rudin, Cynthia
  • Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 180, Issue 3 https://doi.org/10.1111/rssa.12227
journal September 2016
XGBoost: A Scalable Tree Boosting System conference January 2016
On the Design, Analysis, and Characterization of Materials Using Computational Neural Networks journal August 1996
Performance of neural networks in materials science journal April 2009
Survey on deep learning with class imbalance journal March 2019
A Novel Two-Step Hierarchical Quantitative Structure–Activity Relationship Modeling Work Flow for Predicting Acute Toxicity of Chemicals in Rodents journal August 2009
SMOTE: Synthetic Minority Over-sampling Technique journal January 2002
Theory-Guided Machine Learning in Materials Science journal June 2016
“Property Phase Diagrams” for Compound Semiconductors through Data Mining journal January 2013
Feature Selection Methods in QSAR Studies journal May 2012