Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Geometric comparison of popular mixture-model distances.

Conference ·
OSTI ID:1027001
Statistical Latent Dirichlet Analysis produces mixture model data that are geometrically equivalent to points lying on a regular simplex in moderate to high dimensions. Numerous other statistical models and techniques also produce data in this geometric category, even though the meaning of the axes and coordinate values differs significantly. A distance function is used to further analyze these points, for example to cluster them. Several different distance functions are popular amongst statisticians; which distance function is chosen is usually driven by the historical preference of the application domain, information-theoretic considerations, or by the desirability of the clustering results. Relatively little consideration is usually given to how distance functions geometrically transform data, or the distances algebraic properties. Here we take a look at these issues, in the hope of providing complementary insight and inspiring further geometric thought. Several popular distances, {chi}{sup 2}, Jensen - Shannon divergence, and the square of the Hellinger distance, are shown to be nearly equivalent; in terms of functional forms after transformations, factorizations, and series expansions; and in terms of the shape and proximity of constant-value contours. This is somewhat surprising given that their original functional forms look quite different. Cosine similarity is the square of the Euclidean distance, and a similar geometric relationship is shown with Hellinger and another cosine. We suggest a geodesic variation of Hellinger. The square-root projection that arises in Hellinger distance is briefly compared to standard normalization for Euclidean distance. We include detailed derivations of some ratio and difference bounds for illustrative purposes. We provide some constructions that nearly achieve the worst-case ratios, relevant for contours.
Research Organization:
Sandia National Laboratories
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1027001
Report Number(s):
SAND2010-6286C
Country of Publication:
United States
Language:
English

Similar Records

Distance between quantum states in the presence of initial qubit-environment correlations: A comparative study
Journal Article · Thu Sep 15 00:00:00 EDT 2011 · Physical Review. A · OSTI ID:22068641

Sigma-model formulation of Yang-Mills theory on a four-dimensional hypersphere: geodesics as paths
Journal Article · Mon Jan 31 23:00:00 EST 1983 · Sov. J. Nucl. Phys. (Engl. Transl.); (United States) · OSTI ID:5595000

Measuring Thermodynamic Length
Journal Article · Fri Sep 07 00:00:00 EDT 2007 · Physical Review Letters · OSTI ID:960377