DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Assessment of Outliers in Alloy Datasets Using Unsupervised Techniques

Journal Article · · JOM. Journal of the Minerals, Metals & Materials Society
ORCiD logo [1];  [2]; ORCiD logo [2];  [3];  [3]
  1. National Energy Technology Lab. (NETL), Albany, OR (United States). Support Contractor
  2. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  3. National Energy Technology Lab. (NETL), Albany, OR (United States)

We report advancements in data analytics techniques have enabled complex, disparate datasets to be leveraged for alloy design. Identifying outliers in a dataset can reduce noise, identify erroneous and/or anomalous records, prevent overfitting, and improve model assessment and optimization. In this work, two alloy datasets (9-12% Cr ferritic martensitic steels, and austenitic stainless steels) have been assessed for outliers using unsupervised techniques and supplemented with domain knowledge. Principal component analysis and k-means clustering were applied to the data, and points were assessed as outliers based on their distance away from other points in the cluster and from other points in the dataset. The outlier characteristics were investigated to determine both cluster-specific and overall trends in the properties of the outlier points. The approach demonstrated here is extensible to other alloy datasets for outlier identification and evaluation to improve the reliability of machine learning and modeling predictions for advanced alloy design.

Research Organization:
National Energy Technology Laboratory (NETL), Pittsburgh, PA, Morgantown, WV, and Albany, OR (United States); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Fossil Energy (FE)
Grant/Contract Number:
89243318CFE000003; AC05-76RL01830
OSTI ID:
1876555
Report Number(s):
PNNL-SA-169146
Journal Information:
JOM. Journal of the Minerals, Metals & Materials Society, Journal Name: JOM. Journal of the Minerals, Metals & Materials Society Journal Issue: 7 Vol. 74; ISSN 1047-4838
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (19)

Data Science Techniques, Assumptions, and Challenges in Alloy Clustering and Property Prediction journal January 2021
Data Assessment Method to Support the Development of Creep-Resistant Alloys journal January 2020
A Comprehensive Survey of Clustering Algorithms journal June 2015
Principal component analysis journal August 1987
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis journal November 1987
Predictions of long-term creep life for the family of 9–12 wt% Cr martensitic steels journal January 2020
Detection of outliers in gas emissions from urban areas using functional data analysis journal February 2011
Screening of heritage data for improving toughness of creep-resistant martensitic steels journal August 2019
Data clustering: 50 years beyond K-means journal June 2010
Coupling physics in machine learning to predict properties of high-temperatures alloys journal September 2020
Machine learning augmented predictive and generative model for rupture life in ferritic and austenitic steels journal April 2021
Array programming with NumPy journal September 2020
A machine learning aided interpretable model for rupture strength prediction in Fe-based martensitic and austenitic alloys journal March 2021
Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science journal April 2016
Matplotlib: A 2D Graphics Environment journal January 2007
Materials Data Science: Current Status and Future Outlook journal July 2015
The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project journal May 2019
Materials science with large-scale data and informatics: Unlocking new opportunities journal May 2016
seaborn: statistical data visualization journal April 2021