DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration.

Abstract

Gaussian processes and other kernel-based methods are used extensively to construct approximations of multivariate data sets. The accuracy of these approximations is dependent on the data used. This paper presents a computationally efficient algorithm to greedily select training samples that minimize the weighted Lp error of kernel-based approximations for a given number of data. The method successively generates nested samples, with the goal of minimizing the error in high probability regions of densities specified by users. The algorithm presented is extremely simple and can be implemented using existing pivoted Cholesky factorization methods. Training samples are generated in batches which allows training data to be evaluated (labeled) in parallel. For smooth kernels, the algorithm performs comparably with the greedy integrated variance design but has significantly lower complexity. Numerical experiments demonstrate the efficacy of the approach for bounded, unbounded, multi-modal and non-tensor product densities. We also show how to use the proposed algorithm to efficiently generate surrogates for inferring unknown model parameters from data using Bayesian inference.

Authors:
 [1];  [2];  [3]
  1. Univ. of Basel (Switzerland)
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  3. Jacobs Univ., Bremen (Germany)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1770338
Report Number(s):
SAND-2020-12052J
Journal ID: ISSN 1815-2406; 691942
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
Communications in Computational Physics
Additional Journal Information:
Journal Volume: 29; Journal Issue: 4; Journal ID: ISSN 1815-2406
Publisher:
Global Science Press
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Experimental design; active learning; Gaussian process; radial basis function; uncertainty quantification; Bayesian inference

Citation Formats

Helmut, Harbrecht, Jakeman, John Davis, and Zaspel, Peter. Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration.. United States: N. p., 2021. Web. doi:10.4208/cicp.OA-2020-0060.
Helmut, Harbrecht, Jakeman, John Davis, & Zaspel, Peter. Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration.. United States. https://doi.org/10.4208/cicp.OA-2020-0060
Helmut, Harbrecht, Jakeman, John Davis, and Zaspel, Peter. Mon . "Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration.". United States. https://doi.org/10.4208/cicp.OA-2020-0060. https://www.osti.gov/servlets/purl/1770338.
@article{osti_1770338,
title = {Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration.},
author = {Helmut, Harbrecht and Jakeman, John Davis and Zaspel, Peter},
abstractNote = {Gaussian processes and other kernel-based methods are used extensively to construct approximations of multivariate data sets. The accuracy of these approximations is dependent on the data used. This paper presents a computationally efficient algorithm to greedily select training samples that minimize the weighted Lp error of kernel-based approximations for a given number of data. The method successively generates nested samples, with the goal of minimizing the error in high probability regions of densities specified by users. The algorithm presented is extremely simple and can be implemented using existing pivoted Cholesky factorization methods. Training samples are generated in batches which allows training data to be evaluated (labeled) in parallel. For smooth kernels, the algorithm performs comparably with the greedy integrated variance design but has significantly lower complexity. Numerical experiments demonstrate the efficacy of the approach for bounded, unbounded, multi-modal and non-tensor product densities. We also show how to use the proposed algorithm to efficiently generate surrogates for inferring unknown model parameters from data using Bayesian inference.},
doi = {10.4208/cicp.OA-2020-0060},
journal = {Communications in Computational Physics},
number = 4,
volume = 29,
place = {United States},
year = {Mon Feb 01 00:00:00 EST 2021},
month = {Mon Feb 01 00:00:00 EST 2021}
}