Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Fast and Exact Out-of-Core K-Means Clustering Anjan Goswami Ruoming Jin
 

Summary: Fast and Exact Out-of-Core K-Means Clustering
Anjan Goswami Ruoming Jin
Department of Computer Science and Engineering
Ohio State University goswamia,jinr,agrawalĄ @cse.ohio-state.edu
Gagan Agrawal
Abstract
Clustering has been one of the most widely studied top-
ics in data mining and k-means clustering has been one of
the popular clustering algorithms. K-means requires several
passes on the entire dataset, which can make it very expensive
for large disk-resident datasets. In view of this, a lot of work
has been done on various approximate versions of k-means,
which require only one or a small number of passes on the
entire dataset.
In this paper, we present a new algorithm which typically
requires only one or a small number of passes on the entire
dataset, and provably produces the same cluster centers as
reported by the original k-means algorithm. The algorithm
uses sampling to create initial cluster centers, and then takes
one or more passes over the entire dataset to adjust these

  

Source: Agrawal, Gagan - Department of Computer Science and Engineering, Ohio State University
Jin, Ruoming - Department of Computer Science, Kent State University

 

Collections: Computer Technologies and Information Sciences