DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls

Abstract

The exponential growth and success of Machine Learning (ML) has resulted in its application in all scientific domains including Material Science. Advancement in experimental techniques has led to an increase in the volume of material science data encouraging material scientists to investigate data-driven solutions to scientific problems. While the resources available to get started with ML are ever increasing, there is little literature on traversing through the space of decisions that need to be made for implementing a robust and trustworthy ML solution. A lack of such resources leads to researchers wading through articles and papers trying to determine the best approach for their problem and sometimes also falling prey to pitfalls in a real-world scenario. This paper aims to act as a guide for researchers who want to strategically approach a ML solution to their problem through the use of domain knowledge and systematic evaluation of the major aspects of a ML pipeline. We focus on four aspects of the ML pipeline 1. problem formulation, 2. data curation, 3. feature representation and model selection, and 4. model generalizability and real-world performance. In each case, we discuss the space of decision, provide examples from scientific literature, and illustrate how differentmore » choices can affect the outcome through a case study of predicting compressive strength of uniaxially pressed molecular solid, 2,4,6-triamino-1,3,5-trinitrobenzene (TATB) samples. Using a similar approach of critical thinking along with rigorous evaluation and diagnostics, researchers can be assured of the reliability of predictions from their ML models.« less

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3]
  1. Computational Engineering Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
  2. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
  3. Materials Science Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
1885083
Alternate Identifier(s):
OSTI ID: 1890089
Report Number(s):
LLNL-JRNL-832494
Journal ID: ISSN 0897-4756
Grant/Contract Number:  
LDRD 19-SI-001; AC52-07NA27344; LDRD-19-SI-001
Resource Type:
Published Article
Journal Name:
Chemistry of Materials
Additional Journal Information:
Journal Name: Chemistry of Materials Journal Volume: 34 Journal Issue: 17; Journal ID: ISSN 0897-4756
Publisher:
American Chemical Society
Country of Publication:
United States
Language:
English
Subject:
42 ENGINEERING; 36 MATERIALS SCIENCE; 97 MATHEMATICS AND COMPUTING; 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY

Citation Formats

Karande, Piyush, Gallagher, Brian, and Han, Thomas Yong-Jin. A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls. United States: N. p., 2022. Web. doi:10.1021/acs.chemmater.2c01333.
Karande, Piyush, Gallagher, Brian, & Han, Thomas Yong-Jin. A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls. United States. https://doi.org/10.1021/acs.chemmater.2c01333
Karande, Piyush, Gallagher, Brian, and Han, Thomas Yong-Jin. Thu . "A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls". United States. https://doi.org/10.1021/acs.chemmater.2c01333.
@article{osti_1885083,
title = {A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls},
author = {Karande, Piyush and Gallagher, Brian and Han, Thomas Yong-Jin},
abstractNote = {The exponential growth and success of Machine Learning (ML) has resulted in its application in all scientific domains including Material Science. Advancement in experimental techniques has led to an increase in the volume of material science data encouraging material scientists to investigate data-driven solutions to scientific problems. While the resources available to get started with ML are ever increasing, there is little literature on traversing through the space of decisions that need to be made for implementing a robust and trustworthy ML solution. A lack of such resources leads to researchers wading through articles and papers trying to determine the best approach for their problem and sometimes also falling prey to pitfalls in a real-world scenario. This paper aims to act as a guide for researchers who want to strategically approach a ML solution to their problem through the use of domain knowledge and systematic evaluation of the major aspects of a ML pipeline. We focus on four aspects of the ML pipeline 1. problem formulation, 2. data curation, 3. feature representation and model selection, and 4. model generalizability and real-world performance. In each case, we discuss the space of decision, provide examples from scientific literature, and illustrate how different choices can affect the outcome through a case study of predicting compressive strength of uniaxially pressed molecular solid, 2,4,6-triamino-1,3,5-trinitrobenzene (TATB) samples. Using a similar approach of critical thinking along with rigorous evaluation and diagnostics, researchers can be assured of the reliability of predictions from their ML models.},
doi = {10.1021/acs.chemmater.2c01333},
journal = {Chemistry of Materials},
number = 17,
volume = 34,
place = {United States},
year = {Thu Sep 01 00:00:00 EDT 2022},
month = {Thu Sep 01 00:00:00 EDT 2022}
}

Works referenced in this record:

Machine learning for composite materials
journal, March 2019


U-Net: Convolutional Networks for Biomedical Image Segmentation
book, November 2015

  • Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas
  • Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III
  • DOI: 10.1007/978-3-319-24574-4_28

Opportunities and Challenges for Machine Learning in Materials Science
journal, July 2020


Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges
preprint, January 2021


Machine learning in materials science
journal, August 2019


Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
journal, March 2021


Leveraging Uncertainty from Deep Learning for Trustworthy Material Discovery Workflows
journal, May 2021


A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy
conference, April 2020

  • Beede, Emma; Baylor, Elizabeth; Hersch, Fred
  • Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
  • DOI: 10.1145/3313831.3376718

Machine learning-based image processing for on-line defect recognition in additive manufacturing
journal, January 2019


On hyperparameter optimization of machine learning algorithms: Theory and practice
journal, November 2020


A review on deep convolutional neural networks
conference, April 2017

  • Aloysius, Neena; Geetha, M.
  • 2017 International Conference on Communication and Signal Processing (ICCSP)
  • DOI: 10.1109/ICCSP.2017.8286426

Hidden stratification causes clinically meaningful failures in machine learning for medical imaging
conference, April 2020

  • Oakden-Rayner, Luke; Dunnmon, Jared; Carneiro, Gustavo
  • Proceedings of the ACM Conference on Health, Inference, and Learning
  • DOI: 10.1145/3368555.3384468

A study of real-world micrograph data quality and machine learning model robustness
journal, October 2021


Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices
journal, May 2020


Image driven machine learning methods for microstructure recognition
journal, October 2016


Efficient Saliency Maps for Explainable AI
preprint, January 2019


Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal
journal, April 2020


Error assessment and optimal cross-validation approaches in machine learning applied to impurity diffusion
journal, November 2019


A Review of Convolutional Neural Networks
conference, February 2020

  • Ajit, Arohan; Acharya, Koustav; Samanta, Abhishek
  • 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)
  • DOI: 10.1109/ic-ETITE47903.2020.049

Recent advances and applications of machine learning in solid-state materials science
journal, August 2019

  • Schmidt, Jonathan; Marques, Mário R. G.; Botti, Silvana
  • npj Computational Materials, Vol. 5, Issue 1
  • DOI: 10.1038/s41524-019-0221-0

Explaining neural network predictions of material strength
preprint, January 2021


Trends in deep convolutional neural Networks architectures: a review
conference, July 2019

  • Elhassouny, Azeddine; Smarandache, Florentin
  • 2019 International Conference of Computer Science and Renewable Energies (ICCSRE)
  • DOI: 10.1109/ICCSRE.2019.8807741

Deep Residual Learning for Image Recognition
conference, June 2016

  • He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2016.90

Predicting compressive strength of consolidated molecular solids using computer vision and deep learning
journal, May 2020


Machine learning in materials informatics: recent applications and prospects
journal, December 2017

  • Ramprasad, Rampi; Batra, Rohit; Pilania, Ghanshyam
  • npj Computational Materials, Vol. 3, Issue 1
  • DOI: 10.1038/s41524-017-0056-5

Sanity Checks for Saliency Maps
preprint, January 2018


Adoption of Image-Driven Machine Learning for Microstructure Characterization and Materials Design: A Perspective
journal, September 2021


An empirical comparison of supervised learning algorithms
conference, January 2006

  • Caruana, Rich; Niculescu-Mizil, Alexandru
  • Proceedings of the 23rd international conference on Machine learning - ICML '06
  • DOI: 10.1145/1143844.1143865

Increasing the robustness of material-specific deep learning models for crack detection across different materials
journal, March 2020


Machine learning of mechanical properties of steels
journal, May 2020

  • Xiong, Jie; Zhang, TongYi; Shi, SanQiang
  • Science China Technological Sciences, Vol. 63, Issue 7
  • DOI: 10.1007/s11431-020-1599-5

Robust material classification with a tactile skin using deep learning
conference, October 2016

  • Baishya, Shiv S.; Bauml, Berthold
  • 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • DOI: 10.1109/IROS.2016.7758088

Machine Learning Techniques for the Segmentation of Tomographic Image Data of Functional Materials
journal, June 2019


Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans
journal, March 2021


Reliable Graph Neural Network Explanations Through Adversarial Training
preprint, January 2021


Machine learning for molecular and materials science
journal, July 2018


Chemically intuited, large-scale screening of MOFs by machine learning techniques
journal, October 2017

  • Borboudakis, Giorgos; Stergiannakos, Taxiarchis; Frysali, Maria
  • npj Computational Materials, Vol. 3, Issue 1
  • DOI: 10.1038/s41524-017-0045-8

Predicting Energetics Materials’ Crystalline Density from Chemical Structure by Machine Learning
journal, April 2021

  • Nguyen, Phan; Loveland, Donald; Kim, Joanne T.
  • Journal of Chemical Information and Modeling, Vol. 61, Issue 5
  • DOI: 10.1021/acs.jcim.0c01318

Machine learning-driven new material discovery
journal, January 2020

  • Cai, Jiazhen; Chu, Xuan; Xu, Kun
  • Nanoscale Advances, Vol. 2, Issue 8
  • DOI: 10.1039/D0NA00388C

Machine learning of optical properties of materials – predicting spectra from images and images from spectra
journal, January 2019

  • Stein, Helge S.; Guevarra, Dan; Newhouse, Paul F.
  • Chemical Science, Vol. 10, Issue 1
  • DOI: 10.1039/C8SC03077D