skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A statistical approach to combining multisource information in one-class classifiers

Abstract

A new method is introduced in this paper for combining information from multiple sources to support one-class classification. The contributing sources may represent measurements taken by different sensors of the same physical entity, repeated measurements by a single sensor, or numerous features computed from a single measured image or signal. The approach utilizes the theory of statistical hypothesis testing, and applies Fisher's technique for combining p-values, modified to handle nonindependent sources. Classifier outputs take the form of fused p-values, which may be used to gauge the consistency of unknown entities with one or more class hypotheses. The approach enables rigorous assessment of classification uncertainties, and allows for traceability of classifier decisions back to the constituent sources, both of which are important for high-consequence decision support. Application of the technique is illustrated in two challenge problems, one for skin segmentation and the other for terrain labeling. Finally, the method is seen to be particularly effective for relatively small training samples.

Authors:
ORCiD logo [1];  [1];  [1];  [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); SNL Laboratory Directed Research and Development (LDRD) program
OSTI Identifier:
1399493
Report Number(s):
SAND2017-2026J
Journal ID: ISSN 1932-1864; 651299
Grant/Contract Number:
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Statistical Analysis and Data Mining
Additional Journal Information:
Journal Volume: 10; Journal Issue: 4; Journal ID: ISSN 1932-1864
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; classification; dependent p-values; Fisher's combination method; gamma distribution; image segmentation; multisource fusion

Citation Formats

Simonson, Katherine M., Derek West, R., Hansen, Ross L., LaBruyere, Thomas E., and Van Benthem, Mark H.. A statistical approach to combining multisource information in one-class classifiers. United States: N. p., 2017. Web. doi:10.1002/sam.11342.
Simonson, Katherine M., Derek West, R., Hansen, Ross L., LaBruyere, Thomas E., & Van Benthem, Mark H.. A statistical approach to combining multisource information in one-class classifiers. United States. doi:10.1002/sam.11342.
Simonson, Katherine M., Derek West, R., Hansen, Ross L., LaBruyere, Thomas E., and Van Benthem, Mark H.. 2017. "A statistical approach to combining multisource information in one-class classifiers". United States. doi:10.1002/sam.11342.
@article{osti_1399493,
title = {A statistical approach to combining multisource information in one-class classifiers},
author = {Simonson, Katherine M. and Derek West, R. and Hansen, Ross L. and LaBruyere, Thomas E. and Van Benthem, Mark H.},
abstractNote = {A new method is introduced in this paper for combining information from multiple sources to support one-class classification. The contributing sources may represent measurements taken by different sensors of the same physical entity, repeated measurements by a single sensor, or numerous features computed from a single measured image or signal. The approach utilizes the theory of statistical hypothesis testing, and applies Fisher's technique for combining p-values, modified to handle nonindependent sources. Classifier outputs take the form of fused p-values, which may be used to gauge the consistency of unknown entities with one or more class hypotheses. The approach enables rigorous assessment of classification uncertainties, and allows for traceability of classifier decisions back to the constituent sources, both of which are important for high-consequence decision support. Application of the technique is illustrated in two challenge problems, one for skin segmentation and the other for terrain labeling. Finally, the method is seen to be particularly effective for relatively small training samples.},
doi = {10.1002/sam.11342},
journal = {Statistical Analysis and Data Mining},
number = 4,
volume = 10,
place = {United States},
year = 2017,
month = 6
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on June 8, 2018
Publisher's Version of Record

Save / Share:
  • Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examinedmore » ensemble techniques provide a statistically significant improvement (AUC=0.905{+-}0.024) in performance as compared to the original IT-CAD system (AUC=0.865{+-}0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.« less
  • This paper presents new results concerning selection of an optimal information fusion formula for an ensemble of Lipschitz classifiers. The goal of information fusion is to create an integral classificatory which could provide better generalization ability of the ensemble while achieving a practically acceptable level of effectiveness. The problem of information fusion is very relevant for data processing in multi-channel C-OTDR-monitoring systems. In this case we have to effectively classify targeted events which appear in the vicinity of the monitored object. Solution of this problem is based on usage of an ensemble of Lipschitz classifiers each of which corresponds tomore » a respective channel. We suggest a brand new method for information fusion in case of ensemble of Lipschitz classifiers. This method is called “The Weighing of Inversely as Lipschitz Constants” (WILC). Results of WILC-method practical usage in multichannel C-OTDR monitoring systems are presented.« less
  • Since 1975, Petrobras has worked with Brazilian Portland cement manufacturers to develop high-quality Class G cements. The Petrobras R and D Center has analyzed each batch of Class G cement manufactured by prequalified producers to API Spec. 10 standards and to Brazilian Assoc. of Technical Standards (ABNT) NBR 9831 standards. As a consequence, the Drilling Dept. at Petrobras now is supplied by three approved Class G cement factories strategically located in Brazil. This paper statistically analyzes test results on the basis of physical parameters of these Class G cements over 3 years. Statistical indices are reported to evaluate dispersion ofmore » the physical properties to obtain a reliability index for each Class G cement.« less
  • Manual labeling of individual {sup 192}Ir seeds in localization images for dosimetry of multi-strand low-dose-rate (LDR) implants is labor intensive, tedious and prone to error. The objective of this investigation is to develop computer-based methods that analyze digitized localization images, improve dosimetric efficiency, and reduce labeling errors.