 
Summary: Random Matrices in Data Analysis
Dimitris Achlioptas
Microsoft Research, Redmond, WA 98052, USA
optas@microsoft.com
Abstract. We show how carefully crafted random matrices can achieve
distancepreserving dimensionality reduction, accelerate spectral compu
tations, and reduce the sample complexity of certain kernel methods.
1 Introduction
Given a collection of n data points (vectors) in highdimensional Euclidean space
it is natural to ask whether they can be projected into a lower dimensional
Euclidean space without suffering great distortion. Two particularly interesting
classes of projections are: i) projections that tend to preserve the interpoint
distances, and ii) projections that maximize the average projected vector length.
In the last few years, distancepreserving projections have had great impact in
theoretical computer science where they have been useful in a variety of algorith
mic settings, such as approximate nearest neighbor search, clustering, learning
mixtures of distributions, and computing statistics of streamed data.
The general idea is that by providing a low dimensional representation of the
data, distancepreserving embeddings dramatically speed up algorithms whose
runtime depends exponentially in the dimension of the working space. At the
