| | |
Summary: Practical challenges that arise when clustering the
web using spectral methods
Argimiro Arratia 1 and Carlos Mariju´an 2
1
Llenguatges i Sistemes Inform`atics, Universitat Polit`ecnica de Catalunya, Spain,
argimiro@lsi.upc.es
2
Matem´atica Aplicada, Universidad de Valladolid, Spain, marijuan@mat.uva.es
Abstract. This is a report on an implementation of a spectral clustering algorithm
for classifying very large internet sites, with special emphasis on the practical prob-
lems encountered in developing such a data mining system. Remarkably some of these
technical difficulties are due to fundamental issues pertaining to the mathematics in-
volved, and are not treated properly in the literature. Others are inherent to the
functions and numerical methods proper to the high level technical computing pro-
gramming environment that we use. We will point out what these practical challenges
are and how to solve them.
Key words: spectral, clustering, internet
1 Introduction
Spectral clustering is a technique for partitioning data based on the spec-
trum of a similarity matrix: a matrix which registers some pairwise similarity
|