Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels

Journal Article · · Scientific Reports

Abstract

A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. Its success is largely attributed to the GP’s analytical tractability, robustness, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of $$O(N^3)$$ O ( N 3 ) in computation and $$O(N^2)$$ O ( N 2 ) in storage. All existing methods addressing this issue utilize some form of approximation—usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user’s flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover—instead of induce—sparse structure. The premise of this paper is that the data sets and physical processes modeled by GPs often exhibit natural or implicit sparsities, but commonly-used kernels do not allow us to exploit such sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1963337
Alternate ID(s):
OSTI ID: 2229343
Journal Information:
Scientific Reports, Journal Name: Scientific Reports Journal Issue: 1 Vol. 13; ISSN 2045-2322
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (12)

A Case Study Competition Among Methods for Analyzing Large Spatial Data journal December 2018
Recursive estimation for sparse Gaussian process regression journal October 2020
Improving the performance of predictive process modeling for large datasets journal June 2009
A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix journal November 2017
Limitations on low rank approximations for covariance matrices of spatial data journal May 2014
Generalized Local Aggregation for Large Scale Gaussian Process Regression conference July 2020
When Gaussian Process Meets Big Data: A Review of Scalable GPs journal November 2020
Fixed rank kriging for very large spatial data sets: Fixed Rank Kriging journal January 2008
Gaussian predictive process models for large spatial data sets journal September 2008
An Overview of the Global Historical Climatology Network-Daily Database journal July 2012
A General Framework for Vecchia Approximations of Gaussian Processes journal February 2021
A Review of Kernel Methods for Feature Extraction in Nonlinear Process Monitoring journal December 2019

Similar Records

Higher spin partition functions via the quasinormal mode method in de Sitter quantum gravity
Journal Article · Mon Sep 14 00:00:00 EDT 2020 · SciPost Physics · OSTI ID:1660308

Normal modes in thermal AdS via the Selberg zeta function
Journal Article · Tue Jul 21 00:00:00 EDT 2020 · SciPost Physics · OSTI ID:1640231

Related Subjects