|
Summary: An Informationtheoretic Measure for Document Similarity #
Javed A. Aslam
Department of Computer Science
Dartmouth College
jaa@cs.dartmouth.edu
Meredith Frost
Department of Computer Science
Dartmouth College
Meredith.Frost@dartmouth.edu
ABSTRACT
Recent work has demonstrated that the assessment of pair
wise object similarity can be approached in an axiomatic
manner using information theory. We extend this concept
specifically to document similarity and test the e#ective
ness of an informationtheoretic measure for pairwise docu
ment similarity. We adapt query retrieval to rate the quality
of document similarity measures and demonstrate that our
proposed informationtheoretic measure for document simi
larity yields statistically significant improvements over other
popular measures of similarity.
|