skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Geometric comparison of popular mixture-model distances.

Abstract

Statistical Latent Dirichlet Analysis produces mixture model data that are geometrically equivalent to points lying on a regular simplex in moderate to high dimensions. Numerous other statistical models and techniques also produce data in this geometric category, even though the meaning of the axes and coordinate values differs significantly. A distance function is used to further analyze these points, for example to cluster them. Several different distance functions are popular amongst statisticians; which distance function is chosen is usually driven by the historical preference of the application domain, information-theoretic considerations, or by the desirability of the clustering results. Relatively little consideration is usually given to how distance functions geometrically transform data, or the distances algebraic properties. Here we take a look at these issues, in the hope of providing complementary insight and inspiring further geometric thought. Several popular distances, {chi}{sup 2}, Jensen - Shannon divergence, and the square of the Hellinger distance, are shown to be nearly equivalent; in terms of functional forms after transformations, factorizations, and series expansions; and in terms of the shape and proximity of constant-value contours. This is somewhat surprising given that their original functional forms look quite different. Cosine similarity is the square of themore » Euclidean distance, and a similar geometric relationship is shown with Hellinger and another cosine. We suggest a geodesic variation of Hellinger. The square-root projection that arises in Hellinger distance is briefly compared to standard normalization for Euclidean distance. We include detailed derivations of some ratio and difference bounds for illustrative purposes. We provide some constructions that nearly achieve the worst-case ratios, relevant for contours.« less

Authors:
Publication Date:
Research Org.:
Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1027001
Report Number(s):
SAND2010-6286C
TRN: US201121%%211
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Conference
Resource Relation:
Conference: Proposed for presentation at the Foundations of Topological Analysis, VisWeek 2010 held October 24-29, 2010 in Salt Lake City, UT.
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; DIMENSIONS; FUNCTIONALS; GEODESICS; MIXTURES; SERIES EXPANSION; SHAPE; STATISTICAL MODELS; TRANSFORMATIONS

Citation Formats

Mitchell, Scott A. Geometric comparison of popular mixture-model distances.. United States: N. p., 2010. Web.
Mitchell, Scott A. Geometric comparison of popular mixture-model distances.. United States.
Mitchell, Scott A. 2010. "Geometric comparison of popular mixture-model distances.". United States.
@article{osti_1027001,
title = {Geometric comparison of popular mixture-model distances.},
author = {Mitchell, Scott A},
abstractNote = {Statistical Latent Dirichlet Analysis produces mixture model data that are geometrically equivalent to points lying on a regular simplex in moderate to high dimensions. Numerous other statistical models and techniques also produce data in this geometric category, even though the meaning of the axes and coordinate values differs significantly. A distance function is used to further analyze these points, for example to cluster them. Several different distance functions are popular amongst statisticians; which distance function is chosen is usually driven by the historical preference of the application domain, information-theoretic considerations, or by the desirability of the clustering results. Relatively little consideration is usually given to how distance functions geometrically transform data, or the distances algebraic properties. Here we take a look at these issues, in the hope of providing complementary insight and inspiring further geometric thought. Several popular distances, {chi}{sup 2}, Jensen - Shannon divergence, and the square of the Hellinger distance, are shown to be nearly equivalent; in terms of functional forms after transformations, factorizations, and series expansions; and in terms of the shape and proximity of constant-value contours. This is somewhat surprising given that their original functional forms look quite different. Cosine similarity is the square of the Euclidean distance, and a similar geometric relationship is shown with Hellinger and another cosine. We suggest a geodesic variation of Hellinger. The square-root projection that arises in Hellinger distance is briefly compared to standard normalization for Euclidean distance. We include detailed derivations of some ratio and difference bounds for illustrative purposes. We provide some constructions that nearly achieve the worst-case ratios, relevant for contours.},
doi = {},
url = {https://www.osti.gov/biblio/1027001}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Sep 01 00:00:00 EDT 2010},
month = {Wed Sep 01 00:00:00 EDT 2010}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: