Mining for Surprise Events within Text Streams
Abstract
This paper summarizes algorithms and analysis methodology for mining the evolving content in text streams. Text streams include news, press releases from organizations, speeches, Internet blogs, etc. These data are a fundamental source for detecting and characterizing strategic intent of individuals and organizations as well as for detecting abrupt or surprising events within communities. Specifically, an analyst may need to know if and when the topic within a text stream changes. Much of the current text feature methodology is focused on understanding and analyzing a single static collection of text documents. Corresponding analytic activities include summarizing the contents of the collection, grouping the documents based on similarity of content, and calculating concise summaries of the resulting groups. The approach reported here focuses on taking advantage of the temporal characteristics in a text stream to identify relevant features (such as change in content), and also on the analysis and algorithmic methodology to communicate these characteristics to a user. We present a variety of algorithms for detecting essential features within a text stream. A critical finding is that the characteristics used to identify features in a text stream are uncorrelated with the characteristics used to identify features in a static document collection.more »
- Authors:
- Publication Date:
- Research Org.:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 977339
- Report Number(s):
- PNNL-SA-62929
400470000; TRN: US201013%%371
- DOE Contract Number:
- AC05-76RL01830
- Resource Type:
- Conference
- Resource Relation:
- Conference: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), 1(1):617-627
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; INTERNET; INFORMATION RETRIEVAL; DOCUMENT TYPES; DATA ANALYSIS
Citation Formats
Whitney, Paul D, Engel, David W, and Cramer, Nicholas O. Mining for Surprise Events within Text Streams. United States: N. p., 2009.
Web.
Whitney, Paul D, Engel, David W, & Cramer, Nicholas O. Mining for Surprise Events within Text Streams. United States.
Whitney, Paul D, Engel, David W, and Cramer, Nicholas O. 2009.
"Mining for Surprise Events within Text Streams". United States.
@article{osti_977339,
title = {Mining for Surprise Events within Text Streams},
author = {Whitney, Paul D and Engel, David W and Cramer, Nicholas O},
abstractNote = {This paper summarizes algorithms and analysis methodology for mining the evolving content in text streams. Text streams include news, press releases from organizations, speeches, Internet blogs, etc. These data are a fundamental source for detecting and characterizing strategic intent of individuals and organizations as well as for detecting abrupt or surprising events within communities. Specifically, an analyst may need to know if and when the topic within a text stream changes. Much of the current text feature methodology is focused on understanding and analyzing a single static collection of text documents. Corresponding analytic activities include summarizing the contents of the collection, grouping the documents based on similarity of content, and calculating concise summaries of the resulting groups. The approach reported here focuses on taking advantage of the temporal characteristics in a text stream to identify relevant features (such as change in content), and also on the analysis and algorithmic methodology to communicate these characteristics to a user. We present a variety of algorithms for detecting essential features within a text stream. A critical finding is that the characteristics used to identify features in a text stream are uncorrelated with the characteristics used to identify features in a static document collection. Our approach for communicating the information back to the user is to identify feature (word/phrase) groups. These resulting algorithms form the basis of developing software tools for a user to analyze and understand the content of text streams. We present analysis using both news information and abstracts from technical articles, and show how these algorithms provide understanding of the contents of these text streams.},
doi = {},
url = {https://www.osti.gov/biblio/977339},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Apr 30 00:00:00 EDT 2009},
month = {Thu Apr 30 00:00:00 EDT 2009}
}