| | |
Summary: To appear in Proceedings of IEEE International Conference on Data Engineering (ICDE), April 2008.
Nearest Neighbor Retrieval Using DistanceBased
Hashing
Vassilis Athitsos 1 , Michalis Potamias 2 , Panagiotis Papapetrou 2 , and George Kollios 2
1 Computer Science and Engineering Department, University of Texas at Arlington
Arlington, Texas, USA
2 Computer Science Department, Boston University
Boston, Massachusetts, USA
Abstract--- A method is proposed for indexing spaces with ar
bitrary distance measures, so as to achieve efficient approximate
nearest neighbor retrieval. Hashing methods, such as Locality
Sensitive Hashing (LSH), have been successfully applied for
similarity indexing in vector spaces and string spaces under the
Hamming distance. The key novelty of the hashing technique
proposed here is that it can be applied to spaces with arbitrary
distance measures, including nonmetric distance measures. First,
we describe a domainindependent method for constructing a
family of binary hash functions. Then, we use these functions
to construct multiple multibit hash tables. We show that the
LSH formalism is not applicable for analyzing the behavior of
|