DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Convergence Rates for Empirical Estimation of Binary Classification Bounds

Journal Article · · Entropy
DOI: https://doi.org/10.3390/e21121144 · OSTI ID:1801119
ORCiD logo [1];  [2];  [3]; ORCiD logo [2]
  1. Univ. of Maine, Orono, ME (United States). School of Computing and Information Science; OSTI
  2. Univ. of Michigan, Ann Arbor, MI (United States). Dept. of Electrical Engineering and Computer Science
  3. Utah State Univ., Logan, UT (United States). Dept. of Mathematics and Statistics

Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze–Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman–Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman–Rafsky estimator of the Henze–Penrose divergence. We validate our results experimentally and illustrate their application to real datasets.

Research Organization:
Georgia Institute of Technology, Atlanta, GA (United States); Univ. of Michigan, Ann Arbor, MI (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
NA0002534; NA0003921
OSTI ID:
1801119
Journal Information:
Entropy, Journal Name: Entropy Journal Issue: 12 Vol. 21; ISSN ENTRFG; ISSN 1099-4300
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (34)

Asymptotics for Euclidean functionals with power-weighted edges journal February 1996
Image matching using alpha-entropy measures and entropic graphs journal February 2005
Divergence measures for statistical data processing—An annotated bibliography journal April 2013
On the number of leaves of a euclidean minimal spanning tree journal December 1987
A brief survey of time- and frequency-domain adaptive filters conference July 2016
Exponential Concentration of a Density Functional Estimator text January 2014
Generalized Exponential Concentration Inequality for Renyi Divergence Estimation text January 2014
Limit Theorems and Rates of Convergence for Euclidean Functionals journal November 1994
On a Test Whether Two Samples are from the Same Population journal June 1940
Employees’ Motivation and Management of Human Resources in Public Enterprises in Dukagjin’s Plain journal December 2016
Asymptotics for Euclidean minimal spanning trees on random points journal June 1992
Concentration of measure and isoperimetric inequalities in product spaces journal December 1995
The relative neighbourhood graph of a finite planar set journal January 1980
A test of randomness based on the minimal spanning tree journal March 1983
Asymptotics for Euclidean functionals with power-weighted edges journal February 1996
Image matching using alpha-entropy measures and entropic graphs journal February 2005
Divergence measures for statistical data processing—An annotated bibliography journal April 2013
On the number of leaves of a euclidean minimal spanning tree journal December 1987
A synthetic energy dataset for non-intrusive load monitoring in households journal April 2020
Divergence measures based on the Shannon entropy journal January 1991
Blind source separation using Renyi's mutual information journal June 2001
Empirical Non-Parametric Estimation of the Fisher Information journal July 2015
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters journal January 1971
Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure journal February 2016
A General Class of Coefficients of Divergence of One Distribution from Another journal January 1966
Limit Theorems and Rates of Convergence for Euclidean Functionals journal November 1994
A Matching Problem and Subadditive Euclidean Functionals journal August 1993
On Information and Sufficiency journal March 1951
On a Test Whether Two Samples are from the Same Population journal June 1940
On the multivariate runs test journal March 1999
The Jackknife Estimate of Variance journal May 1981
An Efron-Stein Inequality for Nonsymmetric Statistics journal June 1986
On the number of leaves of a euclidean minimal spanning tree journal December 1987
Concentration of Measure and Isoperimetric Inequalities in Product Spaces preprint January 1994