Reliable and explainable machine-learning methods for accelerated material discovery
Abstract
Abstract Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstratemore »
- Authors:
- Publication Date:
- Research Org.:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
- OSTI Identifier:
- 1619662
- Alternate Identifier(s):
- OSTI ID: 1734611
- Report Number(s):
- LLNL-JRNL-764864
Journal ID: ISSN 2057-3960; 108; PII: 248
- Grant/Contract Number:
- AC52-07NA27344; 16-ERD-019; 19-SI-00
- Resource Type:
- Published Article
- Journal Name:
- npj Computational Materials
- Additional Journal Information:
- Journal Name: npj Computational Materials Journal Volume: 5 Journal Issue: 1; Journal ID: ISSN 2057-3960
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Computational methods; design, synthesis and processing
Citation Formats
Kailkhura, Bhavya, Gallagher, Brian, Kim, Sookyung, Hiszpanski, Anna, and Han, T. Yong-Jin. Reliable and explainable machine-learning methods for accelerated material discovery. United Kingdom: N. p., 2019.
Web. doi:10.1038/s41524-019-0248-2.
Kailkhura, Bhavya, Gallagher, Brian, Kim, Sookyung, Hiszpanski, Anna, & Han, T. Yong-Jin. Reliable and explainable machine-learning methods for accelerated material discovery. United Kingdom. https://doi.org/10.1038/s41524-019-0248-2
Kailkhura, Bhavya, Gallagher, Brian, Kim, Sookyung, Hiszpanski, Anna, and Han, T. Yong-Jin. Thu .
"Reliable and explainable machine-learning methods for accelerated material discovery". United Kingdom. https://doi.org/10.1038/s41524-019-0248-2.
@article{osti_1619662,
title = {Reliable and explainable machine-learning methods for accelerated material discovery},
author = {Kailkhura, Bhavya and Gallagher, Brian and Kim, Sookyung and Hiszpanski, Anna and Han, T. Yong-Jin},
abstractNote = {Abstract Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.},
doi = {10.1038/s41524-019-0248-2},
journal = {npj Computational Materials},
number = 1,
volume = 5,
place = {United Kingdom},
year = {2019},
month = {11}
}
https://doi.org/10.1038/s41524-019-0248-2
Web of Science
Works referenced in this record:
A Novel Two-Step Hierarchical Quantitative Structure–Activity Relationship Modeling Work Flow for Predicting Acute Toxicity of Chemicals in Rodents
journal, August 2009
- Zhu, Hao; Ye, Lin; Richard, Ann
- Environmental Health Perspectives, Vol. 117, Issue 8
Feature Selection Methods in QSAR Studies
journal, May 2012
- Goodarzi, Mohammad; Dejaegher, Bieke; Heyden, Yvan Vander
- Journal of AOAC INTERNATIONAL, Vol. 95, Issue 3
Accelerating materials property predictions using machine learning
journal, September 2013
- Pilania, Ghanshyam; Wang, Chenchen; Jiang, Xun
- Scientific Reports, Vol. 3, Issue 1
Accelerated search for materials with targeted properties by adaptive design
journal, April 2016
- Xue, Dezhen; Balachandran, Prasanna V.; Hogden, John
- Nature Communications, Vol. 7, Issue 1
The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
journal, December 2015
- Kirklin, Scott; Saal, James E.; Meredig, Bryce
- npj Computational Materials, Vol. 1, Issue 1
QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
journal, February 2014
- Zakharov, Alexey V.; Peach, Megan L.; Sitzmann, Markus
- Journal of Chemical Information and Modeling, Vol. 54, Issue 3
Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry
journal, June 2012
- Kong, Chang Sun; Luo, Wei; Arapan, Sergiu
- Journal of Chemical Information and Modeling, Vol. 52, Issue 7
Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression
journal, February 2016
- Deml, Ann M.; O’Hayre, Ryan; Wolverton, Chris
- Physical Review B, Vol. 93, Issue 8
Machine learning bandgaps of double perovskites
journal, January 2016
- Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.
- Scientific Reports, Vol. 6, Issue 1
Data Mined Ionic Substitutions for the Discovery of New Compounds
journal, January 2011
- Hautier, Geoffroy; Fischer, Chris; Ehrlacher, Virginie
- Inorganic Chemistry, Vol. 50, Issue 2
Crystal structure representations for machine learning models of formation energies
journal, April 2015
- Faber, Felix; Lindmaa, Alexander; von Lilienfeld, O. Anatole
- International Journal of Quantum Chemistry, Vol. 115, Issue 16
Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties
journal, January 2012
- Le, Tu; Epa, V. Chandana; Burden, Frank R.
- Chemical Reviews, Vol. 112, Issue 5
Artificial neural network aided design of catalyst for propane ammoxidation
journal, November 1997
- Hou, Zhao-Yin; Dai, Qinglain; Wu, Xiao-Qun
- Applied Catalysis A: General, Vol. 161, Issue 1-2
SMOTE: Synthetic Minority Over-sampling Technique
journal, January 2002
- Chawla, N. V.; Bowyer, K. W.; Hall, L. O.
- Journal of Artificial Intelligence Research, Vol. 16
QSAR Modeling: Where Have You Been? Where Are You Going To?
journal, January 2014
- Cherkasov, Artem; Muratov, Eugene N.; Fourches, Denis
- Journal of Medicinal Chemistry, Vol. 57, Issue 12
High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites
journal, October 2017
- Emery, Antoine A.; Wolverton, Chris
- Scientific Data, Vol. 4, Issue 1
Informatics-aided bandgap engineering for solar materials
journal, February 2014
- Dey, Partha; Bible, Joe; Datta, Somnath
- Computational Materials Science, Vol. 83
Learning from imbalanced data: open challenges and future directions
journal, April 2016
- Krawczyk, Bartosz
- Progress in Artificial Intelligence, Vol. 5, Issue 4
“Property Phase Diagrams” for Compound Semiconductors through Data Mining
journal, January 2013
- Srinivasan, Srikant; Rajan, Krishna
- Materials, Vol. 6, Issue 1
Big Data of Materials Science: Critical Role of the Descriptor
journal, March 2015
- Ghiringhelli, Luca M.; Vybiral, Jan; Levchenko, Sergey V.
- Physical Review Letters, Vol. 114, Issue 10
XGBoost: A Scalable Tree Boosting System
conference, January 2016
- Chen, Tianqi; Guestrin, Carlos
- Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16
New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design
journal, May 2002
- Belsky, Alec; Hellenbrandt, Mariette; Karen, Vicky Lynn
- Acta Crystallographica Section B Structural Science, Vol. 58, Issue 3
Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons
journal, April 2010
- Bartók, Albert P.; Payne, Mike C.; Kondor, Risi
- Physical Review Letters, Vol. 104, Issue 13
Predicting Crystal Structures with Data Mining of Quantum Calculations
journal, September 2003
- Curtarolo, Stefano; Morgan, Dane; Persson, Kristin
- Physical Review Letters, Vol. 91, Issue 13
Best Practices for QSAR Model Development, Validation, and Exploitation
journal, July 2010
- Tropsha, Alexander
- Molecular Informatics, Vol. 29, Issue 6-7
Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods
journal, December 2010
- Borra, Simone; Di Ciaccio, Agostino
- Computational Statistics & Data Analysis, Vol. 54, Issue 12
A general-purpose machine learning framework for predicting properties of inorganic materials
journal, August 2016
- Ward, Logan; Agrawal, Ankit; Choudhary, Alok
- npj Computational Materials, Vol. 2, Issue 1
Interpretable classification models for recidivism prediction
journal, September 2016
- Zeng, Jiaming; Ustun, Berk; Rudin, Cynthia
- Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 180, Issue 3
Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids
journal, February 2014
- Seko, Atsuto; Maekawa, Tomoya; Tsuda, Koji
- Physical Review B, Vol. 89, Issue 5
Data-mined similarity function between material compositions
journal, December 2013
- Yang, Lusann; Ceder, Gerbrand
- Physical Review B, Vol. 88, Issue 22
Learning physical descriptors for materials science by compressed sensing
journal, February 2017
- Ghiringhelli, Luca M.; Vybiral, Jan; Ahmetcik, Emre
- New Journal of Physics, Vol. 19, Issue 2
Theory-Guided Machine Learning in Materials Science
journal, June 2016
- Wagner, Nicholas; Rondinelli, James M.
- Frontiers in Materials, Vol. 3
Survey on deep learning with class imbalance
journal, March 2019
- Johnson, Justin M.; Khoshgoftaar, Taghi M.
- Journal of Big Data, Vol. 6, Issue 1
Combinatorial screening for new materials in unconstrained composition space with machine learning
journal, March 2014
- Meredig, B.; Agrawal, A.; Kirklin, S.
- Physical Review B, Vol. 89, Issue 9
On the Design, Analysis, and Characterization of Materials Using Computational Neural Networks
journal, August 1996
- Sumpter, B. G.; Noid, D. W.
- Annual Review of Materials Science, Vol. 26, Issue 1
Predicting crystal structure by merging data mining with quantum mechanics
journal, July 2006
- Fischer, Christopher C.; Tibbetts, Kevin J.; Morgan, Dane
- Nature Materials, Vol. 5, Issue 8
Performance of neural networks in materials science
journal, April 2009
- Bhadeshia, H. K. D. H.; Dimitriu, R. C.; Forsik, S.
- Materials Science and Technology, Vol. 25, Issue 4