Domain-independent information extraction in unstructured text

Irwin, N H

doi:10.2172/378821

Title: Domain-independent information extraction in unstructured text

Technical Report · Sun Sep 01 00:00:00 EDT 1996

DOI:https://doi.org/10.2172/378821· OSTI ID:378821

Irwin, N H ^[1]

Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

View Technical Report

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE, Washington, DC (United States)

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 378821

Report Number(s):: SAND-96-2337; ON: DE96015325; TRN: AHC29620%%34

Resource Relation:: Other Information: PBD: Sep 1996

Country of Publication:: United States

Language:: English

Similar Records

Extraction of information from unstructured text

Technical Report · Wed Nov 01 00:00:00 EST 1995 · OSTI ID:378821

Irwin, N H; DeLand, S M; Crowder, S V

Information Extraction from Unstructured Text for the Biodefense Knowledge Center

Conference · Fri Apr 29 00:00:00 EDT 2005 · OSTI ID:378821

Samatova, N F; Park, B; Krishnamurthy, R; +6 more

Flexible and Scalable Data Fusion using Proactive Schemaless Information Services

Technical Report · Thu May 01 00:00:00 EDT 2014 · OSTI ID:378821

Widener, Patrick

Related Subjects

99 MATHEMATICS
COMPUTERS
INFORMATION SCIENCE
MANAGEMENT
LAW
MISCELLANEOUS
INFORMATION SYSTEMS
INFORMATION RETRIEVAL
COMPUTER NETWORKS
NATURAL LANGUAGE
DOCUMENT TYPES
MEDICAL RECORDS
TREATIES
LEISURE TIME ACTIVITIES

Title: Domain-independent information extraction in unstructured text

Citation Formats

Similar Records

Related Subjects