An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models
Abstract
Modern machine learning and autonomous experimentation schemes in materials science rely on accurate analysis of the data ingested by these models. Unfortunately, accurate analysis of the underlying data can be difficult, even for domain experts, complicating the training of the models intended to drive experiments. This is especially true when the goal is to identify the presence of weak signatures in diffraction or spectroscopic datasets. In this work, we examine a set of as-obtained diffraction data that track the phase transition from monoclinic to tetragonal in a Nb-doped VO2 film as a function of temperature and dopant concentration. We then task a set of domain experts and a set of machine learning experts with identifying which phase is present in each diffraction pattern manually and algorithmically, respectively; in both cases, the labels can vary dramatically, especially at the phase boundaries. We use the mode of the labels and the Shannon entropy as a method to capture, preserve and propagate consensus labels and their variance. Further we use the expert labels as a benchmark and demonstrate the use of Shannon entropy weighted scoring to test the performance of machine learning generated labels. Finally, we propose a material data challenge centered aroundmore »
- Authors:
-
- National Inst. of Standards and Technology (NIST), Gaithersburg, MD (United States)
- National Renewable Energy Lab. (NREL), Golden, CO (United States)
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
- Univ. of Maryland, College Park, MD (United States)
- Publication Date:
- Research Org.:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States)
- Sponsoring Org.:
- USDOE National Renewable Energy Laboratory (NREL), Laboratory Directed Research and Development (LDRD) Program
- OSTI Identifier:
- 1798722
- Report Number(s):
- NREL/JA-5K00-78444
Journal ID: ISSN 2193-9764; MainId:32361;UUID:5f3ffc91-f636-4657-aa90-7f0c4826215d;MainAdminID:25675
- Grant/Contract Number:
- AC36-08GO28308
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Integrating Materials and Manufacturing Innovation
- Additional Journal Information:
- Journal Volume: 10; Journal Issue: 2; Journal ID: ISSN 2193-9764
- Publisher:
- Springer
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; combinatorial; diffraction; machine learning; VO2
Citation Formats
Hattrick-Simpers, Jason R., DeCost, Brian, Kusne, A. Gilad, Joress, Howie, Wong-Ng, Winnie, Kaiser, Debra L., Zakutayev, Andriy, Phillips, Caleb, Sun, Shijing, Thapa, Janak, Yu, Heshan, Takeuchi, Ichiro, and Buonassisi, Tonio. An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models. United States: N. p., 2021.
Web. doi:10.1007/s40192-021-00213-8.
Hattrick-Simpers, Jason R., DeCost, Brian, Kusne, A. Gilad, Joress, Howie, Wong-Ng, Winnie, Kaiser, Debra L., Zakutayev, Andriy, Phillips, Caleb, Sun, Shijing, Thapa, Janak, Yu, Heshan, Takeuchi, Ichiro, & Buonassisi, Tonio. An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models. United States. https://doi.org/10.1007/s40192-021-00213-8
Hattrick-Simpers, Jason R., DeCost, Brian, Kusne, A. Gilad, Joress, Howie, Wong-Ng, Winnie, Kaiser, Debra L., Zakutayev, Andriy, Phillips, Caleb, Sun, Shijing, Thapa, Janak, Yu, Heshan, Takeuchi, Ichiro, and Buonassisi, Tonio. Wed .
"An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models". United States. https://doi.org/10.1007/s40192-021-00213-8. https://www.osti.gov/servlets/purl/1798722.
@article{osti_1798722,
title = {An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models},
author = {Hattrick-Simpers, Jason R. and DeCost, Brian and Kusne, A. Gilad and Joress, Howie and Wong-Ng, Winnie and Kaiser, Debra L. and Zakutayev, Andriy and Phillips, Caleb and Sun, Shijing and Thapa, Janak and Yu, Heshan and Takeuchi, Ichiro and Buonassisi, Tonio},
abstractNote = {Modern machine learning and autonomous experimentation schemes in materials science rely on accurate analysis of the data ingested by these models. Unfortunately, accurate analysis of the underlying data can be difficult, even for domain experts, complicating the training of the models intended to drive experiments. This is especially true when the goal is to identify the presence of weak signatures in diffraction or spectroscopic datasets. In this work, we examine a set of as-obtained diffraction data that track the phase transition from monoclinic to tetragonal in a Nb-doped VO2 film as a function of temperature and dopant concentration. We then task a set of domain experts and a set of machine learning experts with identifying which phase is present in each diffraction pattern manually and algorithmically, respectively; in both cases, the labels can vary dramatically, especially at the phase boundaries. We use the mode of the labels and the Shannon entropy as a method to capture, preserve and propagate consensus labels and their variance. Further we use the expert labels as a benchmark and demonstrate the use of Shannon entropy weighted scoring to test the performance of machine learning generated labels. Finally, we propose a material data challenge centered around generating improved labeling algorithms. This real-world dataset curated with expert labels can act as test bed for new algorithms. The raw data, annotations and code used in this study are all available online at data.gov and the interested reader is encouraged to replicate and improve the existing models},
doi = {10.1007/s40192-021-00213-8},
journal = {Integrating Materials and Manufacturing Innovation},
number = 2,
volume = 10,
place = {United States},
year = {Wed Jun 09 00:00:00 EDT 2021},
month = {Wed Jun 09 00:00:00 EDT 2021}
}
Works referenced in this record:
Pauling's model not universally accepted
journal, January 1986
- Cahn, John W.; Gratias, Denis; Shechtman, Dan
- Nature, Vol. 319, Issue 6049
Adjustment of thermal hysteresis in epitaxial VO2 films by doping metal ions
journal, January 2011
- Nishikawa, Masami; Nakajima, Tomohiko; Kumagai, Toshiya
- Journal of the Ceramic Society of Japan, Vol. 119, Issue 1391
Correlation between thermal hysteresis width and broadening of metal–insulator transition in Cr- and Nb-doped VO 2 films
journal, June 2014
- Miyazaki, Kenichi; Shibuya, Keisuke; Suzuki, Megumi
- Japanese Journal of Applied Physics, Vol. 53, Issue 7
A Kriging-Based Approach to Autonomous Experimentation with Applications to X-Ray Scattering
journal, August 2019
- Noack, Marcus M.; Yager, Kevin G.; Fukuto, Masafumi
- Scientific Reports, Vol. 9, Issue 1
Measurement of the hysteretic thermal properties of W-doped and undoped nanocrystalline powders of VO2
journal, October 2019
- Gomez-Heredia, C. L.; Ramirez-Rincon, J. A.; Bhardwaj, D.
- Scientific Reports, Vol. 9, Issue 1
Get another label? improving data quality and data mining using multiple, noisy labelers
conference, January 2008
- Sheng, Victor S.; Provost, Foster; Ipeirotis, Panagiotis G.
- Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08
On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets
journal, September 2014
- Kusne, Aaron Gilad; Gao, Tieren; Mehta, Apurva
- Scientific Reports, Vol. 4, Issue 1
High-Throughput Measurements of Thermochromic Behavior in V 1– x Nb x O 2 Combinatorial Thin Film Libraries
journal, September 2014
- Barron, S. C.; Gorham, J. M.; Patel, M. P.
- ACS Combinatorial Science, Vol. 16, Issue 10
Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS2
journal, February 2019
- Maksov, Artem; Dyck, Ondrej; Wang, Kai
- npj Computational Materials, Vol. 5, Issue 1
Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy
journal, August 2018
- Krause, Jonathan; Gulshan, Varun; Rahimy, Ehsan
- Ophthalmology, Vol. 125, Issue 8
Structural Characterization of Atomic Layer Deposited Vanadium Dioxide
journal, August 2017
- Kozen, Alexander C.; Joress, Howie; Currie, Marc
- The Journal of Physical Chemistry C, Vol. 121, Issue 35
Recent advances and applications of machine learning in solid-state materials science
journal, August 2019
- Schmidt, Jonathan; Marques, Mário R. G.; Botti, Silvana
- npj Computational Materials, Vol. 5, Issue 1
Automated defect analysis in electron microscopic images
journal, July 2018
- Li, Wei; Field, Kevin G.; Morgan, Dane
- npj Computational Materials, Vol. 4, Issue 1
Self-driving laboratory for accelerated discovery of thin-film materials
journal, May 2020
- MacLeod, B. P.; Parlane, F. G. L.; Morrissey, T. D.
- Science Advances, Vol. 6, Issue 20
Model, prediction, and experimental verification of composition and thickness in continuous spread thin film combinatorial libraries grown by pulsed laser deposition
journal, July 2007
- Bassim, N. D.; Schenck, P. K.; Otani, M.
- Review of Scientific Instruments, Vol. 78, Issue 7
A High-Throughput Structural and Electrochemical Study of Metallic Glass Formation in Ni–Ti–Al
journal, June 2020
- Joress, Howie; DeCost, Brian L.; Sarker, Suchismita
- ACS Combinatorial Science, Vol. 22, Issue 7
Autonomy in materials research: a case study in carbon nanotube growth
journal, October 2016
- Nikolaev, Pavel; Hooper, Daylond; Webber, Frederick
- npj Computational Materials, Vol. 2, Issue 1
How Water’s Properties Are Encoded in Its Molecular Structure and Energies
journal, September 2017
- Brini, Emiliano; Fennell, Christopher J.; Fernandez-Serra, Marivi
- Chemical Reviews, Vol. 117, Issue 19
A Bayesian experimental autonomous researcher for mechanical design
journal, April 2020
- Gongora, Aldair E.; Xu, Bowen; Perry, Wyatt
- Science Advances, Vol. 6, Issue 15
Tuning the hysteresis of a metal-insulator transition via lattice compatibility
journal, July 2020
- Liang, Y. G.; Lee, S.; Yu, H. S.
- Nature Communications, Vol. 11, Issue 1
Active learning of uniformly accurate interatomic potentials for materials simulation
journal, February 2019
- Zhang, Linfeng; Lin, De-Ye; Wang, Han
- Physical Review Materials, Vol. 3, Issue 2