DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reliable and explainable machine-learning methods for accelerated material discovery

Abstract

Abstract Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstratemore » the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.« less

Authors:
; ; ; ; ORCiD logo
Publication Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
1619662
Alternate Identifier(s):
OSTI ID: 1734611
Report Number(s):
LLNL-JRNL-764864
Journal ID: ISSN 2057-3960; 108; PII: 248
Grant/Contract Number:  
AC52-07NA27344; 16-ERD-019; 19-SI-00
Resource Type:
Published Article
Journal Name:
npj Computational Materials
Additional Journal Information:
Journal Name: npj Computational Materials Journal Volume: 5 Journal Issue: 1; Journal ID: ISSN 2057-3960
Publisher:
Nature Publishing Group
Country of Publication:
United Kingdom
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computational methods; design, synthesis and processing

Citation Formats

Kailkhura, Bhavya, Gallagher, Brian, Kim, Sookyung, Hiszpanski, Anna, and Han, T. Yong-Jin. Reliable and explainable machine-learning methods for accelerated material discovery. United Kingdom: N. p., 2019. Web. doi:10.1038/s41524-019-0248-2.
Kailkhura, Bhavya, Gallagher, Brian, Kim, Sookyung, Hiszpanski, Anna, & Han, T. Yong-Jin. Reliable and explainable machine-learning methods for accelerated material discovery. United Kingdom. https://doi.org/10.1038/s41524-019-0248-2
Kailkhura, Bhavya, Gallagher, Brian, Kim, Sookyung, Hiszpanski, Anna, and Han, T. Yong-Jin. Thu . "Reliable and explainable machine-learning methods for accelerated material discovery". United Kingdom. https://doi.org/10.1038/s41524-019-0248-2.
@article{osti_1619662,
title = {Reliable and explainable machine-learning methods for accelerated material discovery},
author = {Kailkhura, Bhavya and Gallagher, Brian and Kim, Sookyung and Hiszpanski, Anna and Han, T. Yong-Jin},
abstractNote = {Abstract Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.},
doi = {10.1038/s41524-019-0248-2},
journal = {npj Computational Materials},
number = 1,
volume = 5,
place = {United Kingdom},
year = {Thu Nov 14 00:00:00 EST 2019},
month = {Thu Nov 14 00:00:00 EST 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1038/s41524-019-0248-2

Citation Metrics:
Cited by: 78 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

A Novel Two-Step Hierarchical Quantitative Structure–Activity Relationship Modeling Work Flow for Predicting Acute Toxicity of Chemicals in Rodents
journal, August 2009

  • Zhu, Hao; Ye, Lin; Richard, Ann
  • Environmental Health Perspectives, Vol. 117, Issue 8
  • DOI: 10.1289/ehp.0800471

Feature Selection Methods in QSAR Studies
journal, May 2012


Accelerating materials property predictions using machine learning
journal, September 2013

  • Pilania, Ghanshyam; Wang, Chenchen; Jiang, Xun
  • Scientific Reports, Vol. 3, Issue 1
  • DOI: 10.1038/srep02810

Accelerated search for materials with targeted properties by adaptive design
journal, April 2016

  • Xue, Dezhen; Balachandran, Prasanna V.; Hogden, John
  • Nature Communications, Vol. 7, Issue 1
  • DOI: 10.1038/ncomms11241

The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
journal, December 2015


QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
journal, February 2014

  • Zakharov, Alexey V.; Peach, Megan L.; Sitzmann, Markus
  • Journal of Chemical Information and Modeling, Vol. 54, Issue 3
  • DOI: 10.1021/ci400737s

Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry
journal, June 2012

  • Kong, Chang Sun; Luo, Wei; Arapan, Sergiu
  • Journal of Chemical Information and Modeling, Vol. 52, Issue 7
  • DOI: 10.1021/ci200628z

Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression
journal, February 2016


Machine learning bandgaps of double perovskites
journal, January 2016

  • Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep19375

Data Mined Ionic Substitutions for the Discovery of New Compounds
journal, January 2011

  • Hautier, Geoffroy; Fischer, Chris; Ehrlacher, Virginie
  • Inorganic Chemistry, Vol. 50, Issue 2
  • DOI: 10.1021/ic102031h

Crystal structure representations for machine learning models of formation energies
journal, April 2015

  • Faber, Felix; Lindmaa, Alexander; von Lilienfeld, O. Anatole
  • International Journal of Quantum Chemistry, Vol. 115, Issue 16
  • DOI: 10.1002/qua.24917

Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties
journal, January 2012

  • Le, Tu; Epa, V. Chandana; Burden, Frank R.
  • Chemical Reviews, Vol. 112, Issue 5
  • DOI: 10.1021/cr200066h

Artificial neural network aided design of catalyst for propane ammoxidation
journal, November 1997


SMOTE: Synthetic Minority Over-sampling Technique
journal, January 2002

  • Chawla, N. V.; Bowyer, K. W.; Hall, L. O.
  • Journal of Artificial Intelligence Research, Vol. 16
  • DOI: 10.1613/jair.953

QSAR Modeling: Where Have You Been? Where Are You Going To?
journal, January 2014

  • Cherkasov, Artem; Muratov, Eugene N.; Fourches, Denis
  • Journal of Medicinal Chemistry, Vol. 57, Issue 12
  • DOI: 10.1021/jm4004285

High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites
journal, October 2017


Informatics-aided bandgap engineering for solar materials
journal, February 2014


Learning from imbalanced data: open challenges and future directions
journal, April 2016


“Property Phase Diagrams” for Compound Semiconductors through Data Mining
journal, January 2013

  • Srinivasan, Srikant; Rajan, Krishna
  • Materials, Vol. 6, Issue 1
  • DOI: 10.3390/ma6010279

Big Data of Materials Science: Critical Role of the Descriptor
journal, March 2015


XGBoost: A Scalable Tree Boosting System
conference, January 2016

  • Chen, Tianqi; Guestrin, Carlos
  • Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16
  • DOI: 10.1145/2939672.2939785

New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design
journal, May 2002

  • Belsky, Alec; Hellenbrandt, Mariette; Karen, Vicky Lynn
  • Acta Crystallographica Section B Structural Science, Vol. 58, Issue 3
  • DOI: 10.1107/S0108768102006948

Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons
journal, April 2010


Predicting Crystal Structures with Data Mining of Quantum Calculations
journal, September 2003


Best Practices for QSAR Model Development, Validation, and Exploitation
journal, July 2010


Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods
journal, December 2010


A general-purpose machine learning framework for predicting properties of inorganic materials
journal, August 2016


Interpretable classification models for recidivism prediction
journal, September 2016

  • Zeng, Jiaming; Ustun, Berk; Rudin, Cynthia
  • Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 180, Issue 3
  • DOI: 10.1111/rssa.12227

Data-mined similarity function between material compositions
journal, December 2013


Learning physical descriptors for materials science by compressed sensing
journal, February 2017

  • Ghiringhelli, Luca M.; Vybiral, Jan; Ahmetcik, Emre
  • New Journal of Physics, Vol. 19, Issue 2
  • DOI: 10.1088/1367-2630/aa57bf

Theory-Guided Machine Learning in Materials Science
journal, June 2016


Survey on deep learning with class imbalance
journal, March 2019


Combinatorial screening for new materials in unconstrained composition space with machine learning
journal, March 2014


On the Design, Analysis, and Characterization of Materials Using Computational Neural Networks
journal, August 1996


Predicting crystal structure by merging data mining with quantum mechanics
journal, July 2006

  • Fischer, Christopher C.; Tibbetts, Kevin J.; Morgan, Dane
  • Nature Materials, Vol. 5, Issue 8
  • DOI: 10.1038/nmat1691

Performance of neural networks in materials science
journal, April 2009

  • Bhadeshia, H. K. D. H.; Dimitriu, R. C.; Forsik, S.
  • Materials Science and Technology, Vol. 25, Issue 4
  • DOI: 10.1179/174328408X311053