|
Summary: An Information-theoretic Measure for Document Similarity
Javed A. Aslam
Department of Computer Science
Dartmouth College
jaa@cs.dartmouth.edu
Meredith Frost
Department of Computer Science
Dartmouth College
Meredith.Frost@dartmouth.edu
ABSTRACT
Recent work has demonstrated that the assessment of pair-
wise object similarity can be approached in an axiomatic
manner using information theory. We extend this concept
specifically to document similarity and test the effective-
ness of an information-theoretic measure for pairwise docu-
ment similarity. We adapt query retrieval to rate the quality
of document similarity measures and demonstrate that our
proposed information-theoretic measure for document simi-
larity yields statistically significant improvements over other
popular measures of similarity.
|