Mining for Emerging Technologies within Text Streams and Documents
Text streams, collections of documents or messages that are generated and observed over time, are ubiquitous. Our research and development is targeted at developing algorithms to find and characterize changes in topic within text streams. To date, this research has demonstrated the ability to detect and describe 1) short duration, atypical events and 2) the emergence of longer term shifts in topical content. This technology has been applied to pre-defined temporally ordered document collections but is also suitable for application to near real-time textual data streams. The underlying event and emergence detection algorithms have been interfaced to an event detection software user interface named SURPRISE. This software provides an interactive graphical user interface and tools for manipulating and correlating the terms and scores identified by the algorithms. Additionally, SURPRISE has been interfaced with the IN-SPIRE text analytics tool to enable an analyst to evaluate the surprising or emerging terms via a visualization of the entire document collection. IN-SPIRE assists in the exploration of related topics, events and views currently based on single term events. The focus of this research is to contribute to detecting, and preventing, strategic surprise.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 978530
- Report Number(s):
- PNNL-SA-64618; 400470000; TRN: US201010%%8
- Resource Relation:
- Conference: Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009: Proceedings in Applied Mathematics, 3:1291-1301
- Country of Publication:
- United States
- Language:
- English
Similar Records
Mining for Surprise Events within Text Streams
Finding Text Information in the Ocean of Electronic Documents