skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Mining for Emerging Technologies within Text Streams and Documents

Conference ·
OSTI ID:978530

Text streams, collections of documents or messages that are generated and observed over time, are ubiquitous. Our research and development is targeted at developing algorithms to find and characterize changes in topic within text streams. To date, this research has demonstrated the ability to detect and describe 1) short duration, atypical events and 2) the emergence of longer term shifts in topical content. This technology has been applied to pre-defined temporally ordered document collections but is also suitable for application to near real-time textual data streams. The underlying event and emergence detection algorithms have been interfaced to an event detection software user interface named SURPRISE. This software provides an interactive graphical user interface and tools for manipulating and correlating the terms and scores identified by the algorithms. Additionally, SURPRISE has been interfaced with the IN-SPIRE text analytics tool to enable an analyst to evaluate the surprising or emerging terms via a visualization of the entire document collection. IN-SPIRE assists in the exploration of related topics, events and views currently based on single term events. The focus of this research is to contribute to detecting, and preventing, strategic surprise.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
978530
Report Number(s):
PNNL-SA-64618; 400470000; TRN: US201010%%8
Resource Relation:
Conference: Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009: Proceedings in Applied Mathematics, 3:1291-1301
Country of Publication:
United States
Language:
English