A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls
Abstract
The exponential growth and success of Machine Learning (ML) has resulted in its application in all scientific domains including Material Science. Advancement in experimental techniques has led to an increase in the volume of material science data encouraging material scientists to investigate data-driven solutions to scientific problems. While the resources available to get started with ML are ever increasing, there is little literature on traversing through the space of decisions that need to be made for implementing a robust and trustworthy ML solution. A lack of such resources leads to researchers wading through articles and papers trying to determine the best approach for their problem and sometimes also falling prey to pitfalls in a real-world scenario. This paper aims to act as a guide for researchers who want to strategically approach a ML solution to their problem through the use of domain knowledge and systematic evaluation of the major aspects of a ML pipeline. We focus on four aspects of the ML pipeline 1. problem formulation, 2. data curation, 3. feature representation and model selection, and 4. model generalizability and real-world performance. In each case, we discuss the space of decision, provide examples from scientific literature, and illustrate how differentmore »
- Authors:
-
- Computational Engineering Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
- Materials Science Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
- Publication Date:
- Research Org.:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
- OSTI Identifier:
- 1885083
- Alternate Identifier(s):
- OSTI ID: 1890089
- Report Number(s):
- LLNL-JRNL-832494
Journal ID: ISSN 0897-4756
- Grant/Contract Number:
- LDRD 19-SI-001; AC52-07NA27344; LDRD-19-SI-001
- Resource Type:
- Published Article
- Journal Name:
- Chemistry of Materials
- Additional Journal Information:
- Journal Name: Chemistry of Materials Journal Volume: 34 Journal Issue: 17; Journal ID: ISSN 0897-4756
- Publisher:
- American Chemical Society
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 42 ENGINEERING; 36 MATERIALS SCIENCE; 97 MATHEMATICS AND COMPUTING; 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY
Citation Formats
Karande, Piyush, Gallagher, Brian, and Han, Thomas Yong-Jin. A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls. United States: N. p., 2022.
Web. doi:10.1021/acs.chemmater.2c01333.
Karande, Piyush, Gallagher, Brian, & Han, Thomas Yong-Jin. A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls. United States. https://doi.org/10.1021/acs.chemmater.2c01333
Karande, Piyush, Gallagher, Brian, and Han, Thomas Yong-Jin. Thu .
"A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls". United States. https://doi.org/10.1021/acs.chemmater.2c01333.
@article{osti_1885083,
title = {A Strategic Approach to Machine Learning for Material Science: How to Tackle Real-World Challenges and Avoid Pitfalls},
author = {Karande, Piyush and Gallagher, Brian and Han, Thomas Yong-Jin},
abstractNote = {The exponential growth and success of Machine Learning (ML) has resulted in its application in all scientific domains including Material Science. Advancement in experimental techniques has led to an increase in the volume of material science data encouraging material scientists to investigate data-driven solutions to scientific problems. While the resources available to get started with ML are ever increasing, there is little literature on traversing through the space of decisions that need to be made for implementing a robust and trustworthy ML solution. A lack of such resources leads to researchers wading through articles and papers trying to determine the best approach for their problem and sometimes also falling prey to pitfalls in a real-world scenario. This paper aims to act as a guide for researchers who want to strategically approach a ML solution to their problem through the use of domain knowledge and systematic evaluation of the major aspects of a ML pipeline. We focus on four aspects of the ML pipeline 1. problem formulation, 2. data curation, 3. feature representation and model selection, and 4. model generalizability and real-world performance. In each case, we discuss the space of decision, provide examples from scientific literature, and illustrate how different choices can affect the outcome through a case study of predicting compressive strength of uniaxially pressed molecular solid, 2,4,6-triamino-1,3,5-trinitrobenzene (TATB) samples. Using a similar approach of critical thinking along with rigorous evaluation and diagnostics, researchers can be assured of the reliability of predictions from their ML models.},
doi = {10.1021/acs.chemmater.2c01333},
journal = {Chemistry of Materials},
number = 17,
volume = 34,
place = {United States},
year = {Thu Sep 01 00:00:00 EDT 2022},
month = {Thu Sep 01 00:00:00 EDT 2022}
}
https://doi.org/10.1021/acs.chemmater.2c01333
Works referenced in this record:
Machine learning for composite materials
journal, March 2019
- Chen, Chun-Teh; Gu, Grace X.
- MRS Communications, Vol. 9, Issue 02
U-Net: Convolutional Networks for Biomedical Image Segmentation
book, November 2015
- Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas
- Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III
Opportunities and Challenges for Machine Learning in Materials Science
journal, July 2020
- Morgan, Dane; Jacobs, Ryan
- Annual Review of Materials Research, Vol. 50, Issue 1
Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges
preprint, January 2021
- Bischl, Bernd; Binder, Martin; Lang, Michel
- arXiv
Machine learning in materials science
journal, August 2019
- Wei, Jing; Chu, Xuan; Sun, Xiang‐Yu
- InfoMat, Vol. 1, Issue 3
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
journal, March 2021
- Alzubaidi, Laith; Zhang, Jinglan; Humaidi, Amjad J.
- Journal of Big Data, Vol. 8, Issue 1
Leveraging Uncertainty from Deep Learning for Trustworthy Material Discovery Workflows
journal, May 2021
- Zhang, Jize; Kailkhura, Bhavya; Han, T. Yong-Jin
- ACS Omega, Vol. 6, Issue 19
A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy
conference, April 2020
- Beede, Emma; Baylor, Elizabeth; Hersch, Fred
- Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
Explainable Deep Learning for Uncovering Actionable Scientific Insights for Materials Discovery and Design
preprint, January 2020
- Liu, Shusen; Kailkhura, Bhavya; Zhang, Jize
- arXiv
Machine learning-based image processing for on-line defect recognition in additive manufacturing
journal, January 2019
- Caggiano, Alessandra; Zhang, Jianjing; Alfieri, Vittorio
- CIRP Annals, Vol. 68, Issue 1
On hyperparameter optimization of machine learning algorithms: Theory and practice
journal, November 2020
- Yang, Li; Shami, Abdallah
- Neurocomputing, Vol. 415
A review on deep convolutional neural networks
conference, April 2017
- Aloysius, Neena; Geetha, M.
- 2017 International Conference on Communication and Signal Processing (ICCSP)
Hidden stratification causes clinically meaningful failures in machine learning for medical imaging
conference, April 2020
- Oakden-Rayner, Luke; Dunnmon, Jared; Carneiro, Gustavo
- Proceedings of the ACM Conference on Health, Inference, and Learning
A study of real-world micrograph data quality and machine learning model robustness
journal, October 2021
- Zhong, Xiaoting; Gallagher, Brian; Eves, Keenan
- npj Computational Materials, Vol. 7, Issue 1
Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices
journal, May 2020
- Wang, Anthony Yu-Tung; Murdock, Ryan J.; Kauwe, Steven K.
- Chemistry of Materials, Vol. 32, Issue 12
Image driven machine learning methods for microstructure recognition
journal, October 2016
- Chowdhury, Aritra; Kautz, Elizabeth; Yener, Bülent
- Computational Materials Science, Vol. 123
Efficient Saliency Maps for Explainable AI
preprint, January 2019
- Mundhenk, T. Nathan; Chen, Barry Y.; Friedland, Gerald
- arXiv
Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal
journal, April 2020
- Wynants, Laure; Van Calster, Ben; Collins, Gary S.
- BMJ
Error assessment and optimal cross-validation approaches in machine learning applied to impurity diffusion
journal, November 2019
- Lu, Hai-Jin; Zou, Nan; Jacobs, Ryan
- Computational Materials Science, Vol. 169
A Review of Convolutional Neural Networks
conference, February 2020
- Ajit, Arohan; Acharya, Koustav; Samanta, Abhishek
- 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)
Recent advances and applications of machine learning in solid-state materials science
journal, August 2019
- Schmidt, Jonathan; Marques, Mário R. G.; Botti, Silvana
- npj Computational Materials, Vol. 5, Issue 1
Explaining neural network predictions of material strength
preprint, January 2021
- Palmer, Ian A.; Mundhenk, T. Nathan; Gallagher, Brian
- arXiv
Trends in deep convolutional neural Networks architectures: a review
conference, July 2019
- Elhassouny, Azeddine; Smarandache, Florentin
- 2019 International Conference of Computer Science and Renewable Energies (ICCSRE)
Deep Residual Learning for Image Recognition
conference, June 2016
- He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Predicting compressive strength of consolidated molecular solids using computer vision and deep learning
journal, May 2020
- Gallagher, Brian; Rever, Matthew; Loveland, Donald
- Materials & Design, Vol. 190
Machine learning in materials informatics: recent applications and prospects
journal, December 2017
- Ramprasad, Rampi; Batra, Rohit; Pilania, Ghanshyam
- npj Computational Materials, Vol. 3, Issue 1
Sanity Checks for Saliency Maps
preprint, January 2018
- Adebayo, Julius; Gilmer, Justin; Muelly, Michael
- arXiv
Adoption of Image-Driven Machine Learning for Microstructure Characterization and Materials Design: A Perspective
journal, September 2021
- Baskaran, Arun; Kautz, Elizabeth J.; Chowdhary, Aritra
- JOM, Vol. 73, Issue 11
An empirical comparison of supervised learning algorithms
conference, January 2006
- Caruana, Rich; Niculescu-Mizil, Alexandru
- Proceedings of the 23rd international conference on Machine learning - ICML '06
Increasing the robustness of material-specific deep learning models for crack detection across different materials
journal, March 2020
- Alipour, Mohamad; Harris, Devin K.
- Engineering Structures, Vol. 206
Machine learning of mechanical properties of steels
journal, May 2020
- Xiong, Jie; Zhang, TongYi; Shi, SanQiang
- Science China Technological Sciences, Vol. 63, Issue 7
Robust material classification with a tactile skin using deep learning
conference, October 2016
- Baishya, Shiv S.; Bauml, Berthold
- 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Machine Learning Techniques for the Segmentation of Tomographic Image Data of Functional Materials
journal, June 2019
- Furat, Orkun; Wang, Mingyan; Neumann, Matthias
- Frontiers in Materials, Vol. 6
Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans
journal, March 2021
- Roberts, Michael; Driggs, Derek; Thorpe, Matthew
- Nature Machine Intelligence, Vol. 3, Issue 3
Hyper-Parameter Optimization: A Review of Algorithms and Applications
preprint, January 2020
- Yu, Tong; Zhu, Hong
- arXiv
Reliable Graph Neural Network Explanations Through Adversarial Training
preprint, January 2021
- Loveland, Donald; Liu, Shusen; Kailkhura, Bhavya
- arXiv
Machine learning for molecular and materials science
journal, July 2018
- Butler, Keith T.; Davies, Daniel W.; Cartwright, Hugh
- Nature, Vol. 559, Issue 7715
Chemically intuited, large-scale screening of MOFs by machine learning techniques
journal, October 2017
- Borboudakis, Giorgos; Stergiannakos, Taxiarchis; Frysali, Maria
- npj Computational Materials, Vol. 3, Issue 1
Predicting Energetics Materials’ Crystalline Density from Chemical Structure by Machine Learning
journal, April 2021
- Nguyen, Phan; Loveland, Donald; Kim, Joanne T.
- Journal of Chemical Information and Modeling, Vol. 61, Issue 5
Machine learning-driven new material discovery
journal, January 2020
- Cai, Jiazhen; Chu, Xuan; Xu, Kun
- Nanoscale Advances, Vol. 2, Issue 8
Application of supervised machine learning for defect detection during metallic powder bed fusion additive manufacturing using high resolution imaging.
journal, May 2018
- Gobert, Christian; Reutzel, Edward W.; Petrich, Jan
- Additive Manufacturing, Vol. 21
Machine learning of optical properties of materials – predicting spectra from images and images from spectra
journal, January 2019
- Stein, Helge S.; Guevarra, Dan; Newhouse, Paul F.
- Chemical Science, Vol. 10, Issue 1