 
Summary: A
Smoothed Analysis of the kMeans Method
DAVID ARTHUR, Stanford University, Department of Computer Science
BODO MANTHEY, University of Twente, Department of Applied Mathematics
HEIKO R ĻOGLIN, University of Bonn, Department of Computer Science
The kmeans method is one of the most widely used clustering algorithms, drawing its popularity from its
speed in practice. Recently, however, it was shown to have exponential worstcase running time. In order to
close the gap between practical performance and theoretical analysis, the kmeans method has been studied
in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds
are still superpolynomial in the number n of data points.
In this paper, we settle the smoothed running time of the kmeans method. We show that the smoothed
number of iterations is bounded by a polynomial in n and 1/, where is the standard deviation of the
Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the
kmeans method will run in expected polynomial time on that input set.
Categories and Subject Descriptors: F.2.0 [Analysis of Algorithms and Problem Complexity]: General
General Terms: Algorithms, Theory
Additional Key Words and Phrases: Data Clustering, kMeans Method, Smoothed Analysis
1. INTRODUCTION
Clustering is a fundamental problem in computer science with applications ranging
from biology to information retrieval and data compression. In a clustering problem,
