skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream

Abstract

We characterize the commercial behavior of a group of companies in a common line of business using a small ensemble of classifiers on a stream of records containing commercial activity information. This approach is able to effectively find a subset of classifiers that can be used to predict company labels with reasonable accuracy. Performance of the ensemble, its error rate under stable conditions, can be characterized using an exponentially weighted moving average (EWMA) statistic. The behavior of the EWMA statistic can be used to monitor a record stream from the commercial network and determine when significant changes have occurred. Results indicate that larger classification ensembles may not necessarily be optimal, pointing to the need to search the combinatorial classifier space in a systematic way. Results also show that current and past performance of an ensemble can be used to detect when statistically significant changes in the activity of the network have occurred. The dataset used in this work contains tens of thousands of high level commercial activity records with continuous and categorical variables and hundreds of labels, making classification challenging.

Authors:
; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1334906
Report Number(s):
PNNL-SA-109834
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: DMIN'15: The 11th International Conference on Data Mining, July 27-30, 2015, Las Vegas, Nevada, 31-36
Country of Publication:
United States
Language:
English
Subject:
Ensemble classifiers; EWMA; optimization

Citation Formats

Heredia-Langner, Alejandro, Rodriguez, Luke R., Lin, Andy, and Webster, Jennifer B. Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream. United States: N. p., 2015. Web.
Heredia-Langner, Alejandro, Rodriguez, Luke R., Lin, Andy, & Webster, Jennifer B. Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream. United States.
Heredia-Langner, Alejandro, Rodriguez, Luke R., Lin, Andy, and Webster, Jennifer B. Wed . "Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream". United States.
@article{osti_1334906,
title = {Selecting a Classification Ensemble and Detecting Process Drift in an Evolving Data Stream},
author = {Heredia-Langner, Alejandro and Rodriguez, Luke R. and Lin, Andy and Webster, Jennifer B.},
abstractNote = {We characterize the commercial behavior of a group of companies in a common line of business using a small ensemble of classifiers on a stream of records containing commercial activity information. This approach is able to effectively find a subset of classifiers that can be used to predict company labels with reasonable accuracy. Performance of the ensemble, its error rate under stable conditions, can be characterized using an exponentially weighted moving average (EWMA) statistic. The behavior of the EWMA statistic can be used to monitor a record stream from the commercial network and determine when significant changes have occurred. Results indicate that larger classification ensembles may not necessarily be optimal, pointing to the need to search the combinatorial classifier space in a systematic way. Results also show that current and past performance of an ensemble can be used to detect when statistically significant changes in the activity of the network have occurred. The dataset used in this work contains tens of thousands of high level commercial activity records with continuous and categorical variables and hundreds of labels, making classification challenging.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {9}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: