ParaText : scalable text analysis and visualization.
Conference
·
OSTI ID:1021689
Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the exibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis.
- Research Organization:
- Sandia National Laboratories
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1021689
- Report Number(s):
- SAND2010-4595C
- Country of Publication:
- United States
- Language:
- English
Similar Records
ParaText : scalable text modeling and analysis.
TexTonic: Interactive Visualization for Exploration and discovery of Very Large Text Collections
Visual Analysis of Text Document Collections
Conference
·
Tue Jun 01 00:00:00 EDT 2010
·
OSTI ID:1020434
TexTonic: Interactive Visualization for Exploration and discovery of Very Large Text Collections
Journal Article
·
Sun Jun 30 20:00:00 EDT 2019
· Information Visualization
·
OSTI ID:1576230
Visual Analysis of Text Document Collections
Conference
·
Tue Nov 29 23:00:00 EST 2005
·
OSTI ID:1092725