Summary: A new approach to clustering
Javed Aslam ? , Alain Leblanc ?? , Cliord Stein ? ? ?
Department of Computer Science
Hanover, NH 03755
Abstract. In this paper we present a new approach for clustering data.
We concentrate on the case where the only available information about
the data is a similarity measure between every pair of elements, and
where the algorithm is expected to handle very noisy data. We make
no assumptions about the size and number of clusters, and the only as-
sumption we make about the data itself is that the similarity measure is
on average higher between elements belonging to the same cluster than
between elements belonging to dierent clusters. The algorithm relies on
very simple operations. The running time is dominated by matrix multi-
plication, and in some cases curve-tting. We will present experimental
results from various implementations of this method.
The problem of clustering a data set into subsets, each containing similar data,
has been very well studied. Several extensive studies on this topic have been writ-
ten, including those of Everitt , Kaufman and Rousseeuw , and Mirkin .