skip to main content

DOE PAGESDOE PAGES

Title: Theory-Guided Machine Learning in Materials Science

Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.
Authors:
 [1] ;  [1]
  1. Northwestern Univ., Evanston, IL (United States). Dept. of Materials Science and Engineering
Publication Date:
Grant/Contract Number:
SC0012375; DMR-1454688
Type:
Accepted Manuscript
Journal Name:
Frontiers in Materials
Additional Journal Information:
Journal Volume: 3; Journal ID: ISSN 2296-8016
Publisher:
Frontiers Research Foundation
Research Org:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Org:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22); National Science Foundation (NSF)
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE; materials informatics; theory; overfitting; descriptor selection; machine learning
OSTI Identifier:
1466646

Wagner, Nicholas, and Rondinelli, James M. Theory-Guided Machine Learning in Materials Science. United States: N. p., Web. doi:10.3389/fmats.2016.00028.
Wagner, Nicholas, & Rondinelli, James M. Theory-Guided Machine Learning in Materials Science. United States. doi:10.3389/fmats.2016.00028.
Wagner, Nicholas, and Rondinelli, James M. 2016. "Theory-Guided Machine Learning in Materials Science". United States. doi:10.3389/fmats.2016.00028. https://www.osti.gov/servlets/purl/1466646.
@article{osti_1466646,
title = {Theory-Guided Machine Learning in Materials Science},
author = {Wagner, Nicholas and Rondinelli, James M.},
abstractNote = {Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.},
doi = {10.3389/fmats.2016.00028},
journal = {Frontiers in Materials},
number = ,
volume = 3,
place = {United States},
year = {2016},
month = {6}
}