skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Toward Uncertainty Quantification for Supervised Classification.

Abstract

Our goal is to develop a general theoretical basis for quantifying uncertainty in supervised machine learning models. Current machine learning accuracy-based validation metrics indi- cate how well a classifier performs on a given data set as a whole. However, these metrics do not tell us a model's efficacy in predicting particular samples. We quantify uncertainty by constructing probability distributions of the predictions made by an ensemble of classifiers. This report details our initial investigations into uncertainty quantification for supervised machine learning. We apply an uncertainty analysis to the problem of malicious website detection. Machine learning models can be trained to find suspicious characteristics in the text of a website's Uniform Resource Locator (URL). However, given the vast numbers of URLs and the ever changing tactics of malicious actors, it will always be possible to find sets of websites which are outliers with respect to a model's hypothesis. Therefore, we seek to understand a model's per-sample reliability when classifying URL data. Acknowledgements This work was funded by the Sandia National Laboratories Laboratory Directed Research and Development (LDRD) program.

Authors:
;
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1527311
Report Number(s):
SAND2018-0032
662975
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English

Citation Formats

Darling, Michael Christopher, and Stracuzzi, David John. Toward Uncertainty Quantification for Supervised Classification.. United States: N. p., 2018. Web. doi:10.2172/1527311.
Darling, Michael Christopher, & Stracuzzi, David John. Toward Uncertainty Quantification for Supervised Classification.. United States. doi:10.2172/1527311.
Darling, Michael Christopher, and Stracuzzi, David John. Mon . "Toward Uncertainty Quantification for Supervised Classification.". United States. doi:10.2172/1527311. https://www.osti.gov/servlets/purl/1527311.
@article{osti_1527311,
title = {Toward Uncertainty Quantification for Supervised Classification.},
author = {Darling, Michael Christopher and Stracuzzi, David John},
abstractNote = {Our goal is to develop a general theoretical basis for quantifying uncertainty in supervised machine learning models. Current machine learning accuracy-based validation metrics indi- cate how well a classifier performs on a given data set as a whole. However, these metrics do not tell us a model's efficacy in predicting particular samples. We quantify uncertainty by constructing probability distributions of the predictions made by an ensemble of classifiers. This report details our initial investigations into uncertainty quantification for supervised machine learning. We apply an uncertainty analysis to the problem of malicious website detection. Machine learning models can be trained to find suspicious characteristics in the text of a website's Uniform Resource Locator (URL). However, given the vast numbers of URLs and the ever changing tactics of malicious actors, it will always be possible to find sets of websites which are outliers with respect to a model's hypothesis. Therefore, we seek to understand a model's per-sample reliability when classifying URL data. Acknowledgements This work was funded by the Sandia National Laboratories Laboratory Directed Research and Development (LDRD) program.},
doi = {10.2172/1527311},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {1}
}