A fast and objective multidimensional kernel density estimation method: fastKDE
Abstract
Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchia and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 105 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDEmore »
- Authors:
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1305435
- Alternate Identifier(s):
- OSTI ID: 1435070
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Published Article
- Journal Name:
- Computational Statistics and Data Analysis (Print)
- Additional Journal Information:
- Journal Name: Computational Statistics and Data Analysis (Print) Journal Volume: 101 Journal Issue: C; Journal ID: ISSN 0167-9473
- Country of Publication:
- Netherlands
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Empirical characteristic function; ECF; Kernel density estimation; Histogram; Nonuniform FFT; NuFFT; Multidimensional; KDE
Citation Formats
O’Brien, Travis A., Kashinath, Karthik, Cavanaugh, Nicholas R., Collins, William D., and O’Brien, John P. A fast and objective multidimensional kernel density estimation method: fastKDE. Netherlands: N. p., 2016.
Web. doi:10.1016/j.csda.2016.02.014.
O’Brien, Travis A., Kashinath, Karthik, Cavanaugh, Nicholas R., Collins, William D., & O’Brien, John P. A fast and objective multidimensional kernel density estimation method: fastKDE. Netherlands. https://doi.org/10.1016/j.csda.2016.02.014
O’Brien, Travis A., Kashinath, Karthik, Cavanaugh, Nicholas R., Collins, William D., and O’Brien, John P. Thu .
"A fast and objective multidimensional kernel density estimation method: fastKDE". Netherlands. https://doi.org/10.1016/j.csda.2016.02.014.
@article{osti_1305435,
title = {A fast and objective multidimensional kernel density estimation method: fastKDE},
author = {O’Brien, Travis A. and Kashinath, Karthik and Cavanaugh, Nicholas R. and Collins, William D. and O’Brien, John P.},
abstractNote = {Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchia and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 105 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.},
doi = {10.1016/j.csda.2016.02.014},
journal = {Computational Statistics and Data Analysis (Print)},
number = C,
volume = 101,
place = {Netherlands},
year = {Thu Sep 01 00:00:00 EDT 2016},
month = {Thu Sep 01 00:00:00 EDT 2016}
}
https://doi.org/10.1016/j.csda.2016.02.014
Web of Science
Works referenced in this record:
A review of cloud top height and optical depth histograms from MISR, ISCCP, and MODIS
journal, January 2010
- Marchand, Roger; Ackerman, Thomas; Smyth, Mike
- Journal of Geophysical Research, Vol. 115, Issue D16
Self-consistent method for density estimation: Density Estimation
journal, April 2011
- Bernacchia, Alberto; Pigolotti, Simone
- Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 73, Issue 3
Cross-validation Bandwidth Matrices for Multivariate Kernel Density Estimation
journal, September 2005
- Duong, Tarn; Hazelton, Martin L.
- Scandinavian Journal of Statistics, Vol. 32, Issue 3
Small-Scale and Mesoscale Variability in Cloudy Boundary Layers: Joint Probability Density Functions
journal, December 2002
- Larson, Vincent E.; Golaz, Jean-Christophe; Cotton, William R.
- Journal of the Atmospheric Sciences, Vol. 59, Issue 24
‘All models are wrong...’: an introduction to model uncertainty
journal, July 2012
- Wit, Ernst; Heuvel, Edwin van den; Romeijn, Jan-Willem
- Statistica Neerlandica, Vol. 66, Issue 3
On dynamic and thermodynamic components of cloud changes
journal, March 2004
- Bony, S.; Dufresne, J. -L.; Le Treut, H.
- Climate Dynamics, Vol. 22, Issue 2-3
Transformations in Density Estimation
journal, June 1991
- Wand, M. P.; Marron, J. S.; Ruppert, D.
- Journal of the American Statistical Association, Vol. 86, Issue 414
Reducing the computational cost of the ECF using a nuFFT: A fast and objective probability density estimation method
journal, November 2014
- O’Brien, Travis A.; Collins, William D.; Rauscher, Sara A.
- Computational Statistics & Data Analysis, Vol. 79
Simulation of the 1976/77 Climate Transition over the North Pacific: Sensitivity to Tropical Forcing
journal, December 2006
- Deser, Clara; Phillips, Adam S.
- Journal of Climate, Vol. 19, Issue 23
Stratiform Rain in the Tropics as Seen by the TRMM Precipitation Radar*
journal, June 2003
- Schumacher, Courtney; Houze, Robert A.
- Journal of Climate, Vol. 16, Issue 11
Global warming and changes in risk of concurrent climate extremes: Insights from the 2014 California drought: Global Warming and Concurrent Extremes
journal, December 2014
- AghaKouchak, Amir; Cheng, Linyin; Mazdiyasni, Omid
- Geophysical Research Letters, Vol. 41, Issue 24
The World's Technological Capacity to Store, Communicate, and Compute Information
journal, February 2011
- Hilbert, M.; Lopez, P.
- Science, Vol. 332, Issue 6025
Bandwidth selection for kernel density estimation: a review of fully automatic selectors
journal, June 2013
- Heidenreich, Nils-Bastian; Schindler, Anja; Sperlich, Stefan
- AStA Advances in Statistical Analysis, Vol. 97, Issue 4
Self-Consistent Density Estimation
journal, June 2014
- Luedicke, Joerg; Bernacchia, Alberto
- The Stata Journal: Promoting communications on statistics and Stata, Vol. 14, Issue 2
Improvements to NOAA’s Historical Merged Land–Ocean Surface Temperature Analysis (1880–2006)
journal, May 2008
- Smith, Thomas M.; Reynolds, Richard W.; Peterson, Thomas C.
- Journal of Climate, Vol. 21, Issue 10