Summary: Random Matrices in Data Analysis
Microsoft Research, Redmond, WA 98052, USA
Abstract. We show how carefully crafted random matrices can achieve
distance-preserving dimensionality reduction, accelerate spectral compu-
tations, and reduce the sample complexity of certain kernel methods.
Given a collection of n data points (vectors) in high-dimensional Euclidean space
it is natural to ask whether they can be projected into a lower dimensional
Euclidean space without suffering great distortion. Two particularly interesting
classes of projections are: i) projections that tend to preserve the interpoint
distances, and ii) projections that maximize the average projected vector length.
In the last few years, distance-preserving projections have had great impact in
theoretical computer science where they have been useful in a variety of algorith-
mic settings, such as approximate nearest neighbor search, clustering, learning
mixtures of distributions, and computing statistics of streamed data.
The general idea is that by providing a low dimensional representation of the
data, distance-preserving embeddings dramatically speed up algorithms whose
run-time depends exponentially in the dimension of the working space. At the