skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Mining for Surprise Events within Text Streams

Abstract

This paper summarizes algorithms and analysis methodology for mining the evolving content in text streams. Text streams include news, press releases from organizations, speeches, Internet blogs, etc. These data are a fundamental source for detecting and characterizing strategic intent of individuals and organizations as well as for detecting abrupt or surprising events within communities. Specifically, an analyst may need to know if and when the topic within a text stream changes. Much of the current text feature methodology is focused on understanding and analyzing a single static collection of text documents. Corresponding analytic activities include summarizing the contents of the collection, grouping the documents based on similarity of content, and calculating concise summaries of the resulting groups. The approach reported here focuses on taking advantage of the temporal characteristics in a text stream to identify relevant features (such as change in content), and also on the analysis and algorithmic methodology to communicate these characteristics to a user. We present a variety of algorithms for detecting essential features within a text stream. A critical finding is that the characteristics used to identify features in a text stream are uncorrelated with the characteristics used to identify features in a static document collection.more » Our approach for communicating the information back to the user is to identify feature (word/phrase) groups. These resulting algorithms form the basis of developing software tools for a user to analyze and understand the content of text streams. We present analysis using both news information and abstracts from technical articles, and show how these algorithms provide understanding of the contents of these text streams.« less

Authors:
; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
977339
Report Number(s):
PNNL-SA-62929
400470000; TRN: US201013%%371
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), 1(1):617-627
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; INTERNET; INFORMATION RETRIEVAL; DOCUMENT TYPES; DATA ANALYSIS

Citation Formats

Whitney, Paul D, Engel, David W, and Cramer, Nicholas O. Mining for Surprise Events within Text Streams. United States: N. p., 2009. Web.
Whitney, Paul D, Engel, David W, & Cramer, Nicholas O. Mining for Surprise Events within Text Streams. United States.
Whitney, Paul D, Engel, David W, and Cramer, Nicholas O. 2009. "Mining for Surprise Events within Text Streams". United States.
@article{osti_977339,
title = {Mining for Surprise Events within Text Streams},
author = {Whitney, Paul D and Engel, David W and Cramer, Nicholas O},
abstractNote = {This paper summarizes algorithms and analysis methodology for mining the evolving content in text streams. Text streams include news, press releases from organizations, speeches, Internet blogs, etc. These data are a fundamental source for detecting and characterizing strategic intent of individuals and organizations as well as for detecting abrupt or surprising events within communities. Specifically, an analyst may need to know if and when the topic within a text stream changes. Much of the current text feature methodology is focused on understanding and analyzing a single static collection of text documents. Corresponding analytic activities include summarizing the contents of the collection, grouping the documents based on similarity of content, and calculating concise summaries of the resulting groups. The approach reported here focuses on taking advantage of the temporal characteristics in a text stream to identify relevant features (such as change in content), and also on the analysis and algorithmic methodology to communicate these characteristics to a user. We present a variety of algorithms for detecting essential features within a text stream. A critical finding is that the characteristics used to identify features in a text stream are uncorrelated with the characteristics used to identify features in a static document collection. Our approach for communicating the information back to the user is to identify feature (word/phrase) groups. These resulting algorithms form the basis of developing software tools for a user to analyze and understand the content of text streams. We present analysis using both news information and abstracts from technical articles, and show how these algorithms provide understanding of the contents of these text streams.},
doi = {},
url = {https://www.osti.gov/biblio/977339}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Apr 30 00:00:00 EDT 2009},
month = {Thu Apr 30 00:00:00 EDT 2009}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: