skip to main content

DOE PAGESDOE PAGES

Title: A fast and objective multidimensional kernel density estimation method: fastKDE

Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchia and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10 5 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-sciencemore » KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.« less
Authors:
ORCiD logo [1] ;  [2] ; ORCiD logo [2] ;  [3] ;  [4]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Davis, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
  4. Univ. of California, Santa Cruz, CA (United States)
Publication Date:
Grant/Contract Number:
AC02-05CH11231
Type:
Published Article
Journal Name:
Computational Statistics and Data Analysis (Print)
Additional Journal Information:
Journal Name: Computational Statistics and Data Analysis (Print); Journal Volume: 101; Journal Issue: C; Journal ID: ISSN 0167-9473
Research Org:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Empirical characteristic function; ECF; Kernel density estimation; Histogram; Nonuniform FFT; NuFFT; Multidimensional; KDE
OSTI Identifier:
1305435
Alternate Identifier(s):
OSTI ID: 1435070

O'Brien, Travis A., Kashinath, Karthik, Cavanaugh, Nicholas R., Collins, William D., and O?Brien, John P.. A fast and objective multidimensional kernel density estimation method: fastKDE. United States: N. p., Web. doi:10.1016/j.csda.2016.02.014.
O'Brien, Travis A., Kashinath, Karthik, Cavanaugh, Nicholas R., Collins, William D., & O?Brien, John P.. A fast and objective multidimensional kernel density estimation method: fastKDE. United States. doi:10.1016/j.csda.2016.02.014.
O'Brien, Travis A., Kashinath, Karthik, Cavanaugh, Nicholas R., Collins, William D., and O?Brien, John P.. 2016. "A fast and objective multidimensional kernel density estimation method: fastKDE". United States. doi:10.1016/j.csda.2016.02.014.
@article{osti_1305435,
title = {A fast and objective multidimensional kernel density estimation method: fastKDE},
author = {O'Brien, Travis A. and Kashinath, Karthik and Cavanaugh, Nicholas R. and Collins, William D. and O?Brien, John P.},
abstractNote = {Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchia and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 105 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.},
doi = {10.1016/j.csda.2016.02.014},
journal = {Computational Statistics and Data Analysis (Print)},
number = C,
volume = 101,
place = {United States},
year = {2016},
month = {3}
}