 
Summary: Accounting for Boundary Effects in Nearest Neighbor
Searching
Sunil Arya
David M. Mount
Onuttom Narayan §
Abstract
Given n data points in ddimensional space, nearest neighbor searching involves
determining the nearest of these data points to a given query point. Most average
case analyses of nearest neighbor searching algorithms are made under the simplifying
assumption that d is fixed and that n is so large relative to d that boundary effects can
be ignored. This means that for any query point the statistical distribution of the data
points surrounding it is independent of the location of the query point. However, in
many applications of nearest neighbor searching (such as data compression by vector
quantization) this assumption is not met, since the number of data points n grows
roughly as 2d
. Largely for this reason, the actual performances of many nearest neighbor
algorithms tend to be much better than their theoretical analyses would suggest. We
present evidence of why this is the case. We provide an accurate analysis of the number
of cells visited in nearest neighbor searching by the bucketing and kd tree algorithms.
We assume md
