| | |
Summary: Generating, Visualizing, and Evaluating
HighQuality Clusters for Information
Organization
Javed Aslam, Katya Pelekhov, and Daniela Rus
Department of Computer Science, Dartmouth College, Hanover NH 03755
{jaa,katya,rus}@cs.dartmouth.edu
Abstract. We present and analyze the star clustering algorithm. We
discuss an implementation of this algorithm that supports browsing and
document retrieval through information organization. We define three
parameters for evaluating a clustering algorithm to measure the topic
separation and topic aggregation achieved by the algorithm. In the ab
sence of benchmarks, we present a method for randomly generating clus
tering data. Data from our user study shows evidence that the star al
gorithm is e#ective for organizing information.
1 Introduction
Modern information systems have vast amounts of unorganized data. Users of
ten don't know what they need until they need it. In dynamic, timepressured
situations such as emergency relief for weather disasters, presenting the results
of a query as a ranked list of hundreds of titles is ine#ective. To cull the criti
cal information out of a large set of potentially useful sources we need methods
|