Experiments in automatic word class and word sense identification for information retrieval

Gauch, S; Futrelle, R P

Experiments in automatic word class and word sense identification for information retrieval

Technical Report · Fri Dec 30 23:00:00 EST 1994

OSTI ID:68594

Gauch, S ^[1]; Futrelle, R P ^[2]

Univ. of Kansas, Lawrence, KS (United States)
Northeastern Univ., Lawrence, KS (United States)

Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval system4ms. Large online corpora and increased computational capabilities make new techniques based on corpus linguisitics feasible. Corpus-based analysis is especially needed for corpora from specialized fields for which no electronic dictionaries or thesauri exist. The methods described here use a combination of mutual information and word context to establish word similarities. Then, unsupervised classification is done using clustering in the word space, identifying word classes without pretagging. We also describe an extension of the method to handle the difficult problems of disambiguation and of determining part-of-speech and semantic information for low-frequency words. The method is powerful enough to produce high-quality results on a small corpus of 200,000 words from abstracts in a field of molecular biology.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Nevada Univ., Las Vegas, NV (United States)

OSTI ID:: 68594

Report Number(s):: CONF-9404212--

Country of Publication:: United States

Language:: English

Similar Records

LEARNING SEMANTICS-ENHANCED LANGUAGE MODELS APPLIED TO UNSUEPRVISED WSD

Conference · Sun Jan 28 23:00:00 EST 2007 · OSTI ID:985889

Word prediction

Technical Report · Mon May 01 00:00:00 EDT 1995 · OSTI ID:123254

Word Domain Disambiguation via Word Sense Disambiguation

Conference · Sun Jun 04 00:00:00 EDT 2006 · OSTI ID:908504

Related Subjects

99 GENERAL AND MISCELLANEOUS
ACCURACY
INFORMATION RETRIEVAL
SPATIAL DISTRIBUTION
STANDARDIZED TERMINOLOGY

Experiments in automatic word class and word sense identification for information retrieval

Citation Formats

Similar Records

Related Subjects