skip to main content

DOE PAGESDOE PAGES

Title: Classifying with confidence from incomplete information.

For this paper, we consider the problem of classifying a test sample given incomplete information. This problem arises naturally when data about a test sample is collected over time, or when costs must be incurred to compute the classification features. For example, in a distributed sensor network only a fraction of the sensors may have reported measurements at a certain time, and additional time, power, and bandwidth is needed to collect the complete data to classify. A practical goal is to assign a class label as soon as enough data is available to make a good decision. We formalize this goal through the notion of reliability—the probability that a label assigned given incomplete data would be the same as the label assigned given the complete data, and we propose a method to classify incomplete data only if some reliability threshold is met. Our approach models the complete data as a random variable whose distribution is dependent on the current incomplete data and the (complete) training data. The method differs from standard imputation strategies in that our focus is on determining the reliability of the classification decision, rather than just the class label. We show that the method provides useful reliabilitymore » estimates of the correctness of the imputed class labels on a set of experiments on time-series data sets, where the goal is to classify the time-series as early as possible while still guaranteeing that the reliability threshold is met.« less
Authors:
 [1] ;  [2] ;  [3] ;  [4]
  1. Johns Hopkins Univ., Baltimore, MD (United States). Applied Physics Lab.
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  3. Google Research, Mountain View, CA (United States)
  4. Univ. of Washington, Seattle, WA (United States). Dept. of Electrical Engineering
Publication Date:
Report Number(s):
SAND2012-5243J
Journal ID: ISSN 1532-4435; 547347
Grant/Contract Number:
AC04-94AL85000
Type:
Accepted Manuscript
Journal Name:
Journal of Machine Learning Research
Additional Journal Information:
Journal Volume: 14; Journal Issue: 1; Journal ID: ISSN 1532-4435
Publisher:
JMLR
Research Org:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA); US Department of the Navy, Office of Naval Research (ONR)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; classification; sensor networks; signals; reliability
OSTI Identifier:
1426914

Parrish, Nathan, Anderson, Hyrum S., Gupta, Maya R., and Hsaio, Dun Yu. Classifying with confidence from incomplete information.. United States: N. p., Web.
Parrish, Nathan, Anderson, Hyrum S., Gupta, Maya R., & Hsaio, Dun Yu. Classifying with confidence from incomplete information.. United States.
Parrish, Nathan, Anderson, Hyrum S., Gupta, Maya R., and Hsaio, Dun Yu. 2013. "Classifying with confidence from incomplete information.". United States. doi:. https://www.osti.gov/servlets/purl/1426914.
@article{osti_1426914,
title = {Classifying with confidence from incomplete information.},
author = {Parrish, Nathan and Anderson, Hyrum S. and Gupta, Maya R. and Hsaio, Dun Yu},
abstractNote = {For this paper, we consider the problem of classifying a test sample given incomplete information. This problem arises naturally when data about a test sample is collected over time, or when costs must be incurred to compute the classification features. For example, in a distributed sensor network only a fraction of the sensors may have reported measurements at a certain time, and additional time, power, and bandwidth is needed to collect the complete data to classify. A practical goal is to assign a class label as soon as enough data is available to make a good decision. We formalize this goal through the notion of reliability—the probability that a label assigned given incomplete data would be the same as the label assigned given the complete data, and we propose a method to classify incomplete data only if some reliability threshold is met. Our approach models the complete data as a random variable whose distribution is dependent on the current incomplete data and the (complete) training data. The method differs from standard imputation strategies in that our focus is on determining the reliability of the classification decision, rather than just the class label. We show that the method provides useful reliability estimates of the correctness of the imputed class labels on a set of experiments on time-series data sets, where the goal is to classify the time-series as early as possible while still guaranteeing that the reliability threshold is met.},
doi = {},
journal = {Journal of Machine Learning Research},
number = 1,
volume = 14,
place = {United States},
year = {2013},
month = {12}
}