Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Smoothed Analysis of the k-Means Method DAVID ARTHUR, Stanford University, Department of Computer Science
 

Summary: A
Smoothed Analysis of the k-Means Method
DAVID ARTHUR, Stanford University, Department of Computer Science
BODO MANTHEY, University of Twente, Department of Applied Mathematics
HEIKO R ĻOGLIN, University of Bonn, Department of Computer Science
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its
speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to
close the gap between practical performance and theoretical analysis, the k-means method has been studied
in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds
are still super-polynomial in the number n of data points.
In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed
number of iterations is bounded by a polynomial in n and 1/, where is the standard deviation of the
Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the
k-means method will run in expected polynomial time on that input set.
Categories and Subject Descriptors: F.2.0 [Analysis of Algorithms and Problem Complexity]: General
General Terms: Algorithms, Theory
Additional Key Words and Phrases: Data Clustering, k-Means Method, Smoothed Analysis
1. INTRODUCTION
Clustering is a fundamental problem in computer science with applications ranging
from biology to information retrieval and data compression. In a clustering problem,

  

Source: Al Hanbali, Ahmad - Department of Applied Mathematics, Universiteit Twente

 

Collections: Engineering