| | |
Summary: Distributional Word Clustering in Parallel
Alan L. Ritter James W. Hearne Philip A. Nelson
Computer Science Computer Science Computer Science
Western Washington University Western Washington University Western Washington University
Bellingham, WA 98225 Bellingham, WA 98225 Bellingham, WA 98225
ritter.alan@gmail.com James.Hearne@cs.wwu.edu phil@cs.wwu.edu
Abstract
We discuss various methods which have been ap-
plied to grouping words into syntactic and semantic
categories, primarily how they deal with the problems
of sparsity and computational complexity. We then
present a method of distributional clustering, and dis-
cuss the parallelization of the most computationally
intensive part of this process.
1 Introduction
There are many reasons to group words into syn-
tactic or semantic categories from purely distributional
information. For example, tagging words whose part-
of-speech properties are not known[12], improving the
performance of n-gram models[2] and exploratory data
|