skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Theory-Guided Machine Learning in Materials Science

Abstract

Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.

Authors:
 [1];  [1]
  1. Northwestern Univ., Evanston, IL (United States). Dept. of Materials Science and Engineering
Publication Date:
Research Org.:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22); National Science Foundation (NSF)
OSTI Identifier:
1466646
Grant/Contract Number:  
SC0012375; DMR-1454688
Resource Type:
Accepted Manuscript
Journal Name:
Frontiers in Materials
Additional Journal Information:
Journal Volume: 3; Journal ID: ISSN 2296-8016
Publisher:
Frontiers Research Foundation
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE; materials informatics; theory; overfitting; descriptor selection; machine learning

Citation Formats

Wagner, Nicholas, and Rondinelli, James M. Theory-Guided Machine Learning in Materials Science. United States: N. p., 2016. Web. doi:10.3389/fmats.2016.00028.
Wagner, Nicholas, & Rondinelli, James M. Theory-Guided Machine Learning in Materials Science. United States. doi:10.3389/fmats.2016.00028.
Wagner, Nicholas, and Rondinelli, James M. Mon . "Theory-Guided Machine Learning in Materials Science". United States. doi:10.3389/fmats.2016.00028. https://www.osti.gov/servlets/purl/1466646.
@article{osti_1466646,
title = {Theory-Guided Machine Learning in Materials Science},
author = {Wagner, Nicholas and Rondinelli, James M.},
abstractNote = {Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.},
doi = {10.3389/fmats.2016.00028},
journal = {Frontiers in Materials},
number = ,
volume = 3,
place = {United States},
year = {2016},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 15 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Prediction of the crystal structures of perovskites using the software program SPuDS
journal, November 2001

  • Lufaso, Michael W.; Woodward, Patrick M.
  • Acta Crystallographica Section B Structural Science, Vol. 57, Issue 6
  • DOI: 10.1107/S0108768101015282

Proxies from Ab Initio Calculations for Screening Efficient Ce 3+ Phosphor Hosts
journal, August 2013

  • Brgoch, Jakoah; DenBaars, Steven P.; Seshadri, Ram
  • The Journal of Physical Chemistry C, Vol. 117, Issue 35
  • DOI: 10.1021/jp405858e

Compressive sensing as a paradigm for building physics models
journal, January 2013


What is principal component analysis?
journal, March 2008


Recent progress in bulk glassy, nanoquasicrystalline and nanocrystalline alloys
journal, July 2004


Principal Component Analysis of Catalytic Functions in the Composition Space of Heterogeneous Catalysts
journal, April 2007

  • Sieg, Simone C.; Suh, Changwon; Schmidt, Timm
  • QSAR & Combinatorial Science, Vol. 26, Issue 4
  • DOI: 10.1002/qsar.200620074

Classification of AB O 3 perovskite solids: a machine learning study
journal, September 2015

  • Pilania, G.; Balachandran, P. V.; Gubernatis, J. E.
  • Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials, Vol. 71, Issue 5
  • DOI: 10.1107/S2052520615013979

The Development of Descriptors for Solids: Teaching“Catalytic Intuition” to a Computer
journal, October 2004

  • Klanner, Catharina; Farrusseng, David; Baumes, Laurent
  • Angewandte Chemie International Edition, Vol. 43, Issue 40
  • DOI: 10.1002/anie.200460731

Deep feature synthesis: Towards automating data science endeavors
conference, October 2015

  • Kanter, James Max; Veeramachaneni, Kalyan
  • 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
  • DOI: 10.1109/DSAA.2015.7344858

Big Data of Materials Science: Critical Role of the Descriptor
journal, March 2015


Top-Down Induction of Decision Trees Classifiers—A Survey
journal, November 2005

  • Rokach, L.; Maimon, O.
  • IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), Vol. 35, Issue 4
  • DOI: 10.1109/TSMCC.2004.843247

Identification of phases, symmetries and defects through local crystallography
journal, July 2015

  • Belianinov, Alex; He, Qian; Kravchenko, Mikhail
  • Nature Communications, Vol. 6, Issue 1
  • DOI: 10.1038/ncomms8801

Coupling and electrical control of structural, orbital and magnetic orders in perovskites
journal, October 2015

  • Varignon, Julien; Bristowe, Nicholas C.; Bousquet, Eric
  • Scientific Reports, Vol. 5, Issue 1
  • DOI: 10.1038/srep15364

Big data and deep data in scanning and electron microscopies: deriving functionality from multidimensional data sets
journal, May 2015

  • Belianinov, Alex; Vasudevan, Rama; Strelcov, Evgheni
  • Advanced Structural and Chemical Imaging, Vol. 1, Issue 1
  • DOI: 10.1186/s40679-015-0006-6

Mode crystallography of distorted structures
journal, July 2010

  • Perez-Mato, J. M.; Orobengoa, D.; Aroyo, M. I.
  • Acta Crystallographica Section A Foundations of Crystallography, Vol. 66, Issue 5
  • DOI: 10.1107/S0108767310016247

Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory
journal, June 2010

  • Hautier, Geoffroy; Fischer, Christopher C.; Jain, Anubhav
  • Chemistry of Materials, Vol. 22, Issue 12
  • DOI: 10.1021/cm100795d

Research Update: Towards designed functionalities in oxide-based electronic materials
journal, August 2015

  • Rondinelli, James M.; Poeppelmeier, Kenneth R.; Zunger, Alex
  • APL Materials, Vol. 3, Issue 8
  • DOI: 10.1063/1.4928289

Identifying the ‘inorganic gene’ for high-temperature piezoelectric perovskites through statistical learning
journal, February 2011

  • Balachandran, Prasanna V.; Broderick, Scott R.; Rajan, Krishna
  • Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 467, Issue 2132
  • DOI: 10.1098/rspa.2010.0543

    Works referencing / citing this record:

    Representing molecular and materials data for unsupervised machine learning
    journal, April 2018


    Predicting the Tensile Behaviour of Cast Alloys by a Pattern Recognition Analysis on Experimental Data
    journal, May 2019

    • Fragassa, Cristiano; Babic, Matej; Bergmann, Carlos Perez
    • Metals, Vol. 9, Issue 5
    • DOI: 10.3390/met9050557

    Representing molecular and materials data for unsupervised machine learning
    journal, April 2018


    Predicting the Tensile Behaviour of Cast Alloys by a Pattern Recognition Analysis on Experimental Data
    journal, May 2019

    • Fragassa, Cristiano; Babic, Matej; Bergmann, Carlos Perez
    • Metals, Vol. 9, Issue 5
    • DOI: 10.3390/met9050557