| | |
Summary: Automatic Reduction of a Document-Derived Noun Vocabulary
Sven Anderson and S. Rebecca Thomas and Camden Segal
Bard College, Annandale-on-Hudson, NY 12504
{sanderso, thomas, cs471}@bard.edu
Yu Wu
Stanford University, Stanford, CA 94305
ywu2@stanford.edu
Abstract
We propose and evaluate five related algorithms that automat-
ically derive limited-size noun vocabularies from text doc-
uments of 2,000-30,000 words. The proposed algorithms
combine Personalized Page Rank and principles of informa-
tion maximization, and are applied to the WordNet graph
for nouns. For the best-performing algorithm the difference
between automatically generated reduced noun lexicons and
those created by human writers is approximately 1-2 Word-
Net edges per lexical item. Our results also indicate the
importance of performing word-sense disambiguation with
sentence-level context information at the earliest stage of
analysis.
|