Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

An adaptive Hessian approximated stochastic gradient MCMC method

Journal Article · · Journal of Computational Physics
 [1];  [2];  [3]
  1. Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics; Purdue Univ., West Lafayette, IN (United States)
  2. Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics
  3. Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics. School of Mechanical Engineering. Dept. of Statistics. Dept. of Earth, Atmospheric, and Planetary Sciences
Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. Here, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.
Research Organization:
Purdue Univ., West Lafayette, IN (United States)
Sponsoring Organization:
National Science Foundation (NSF); US Army Research Office (ARO); USDOE Office of Science (SC); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
SC0021142
OSTI ID:
1853727
Alternate ID(s):
OSTI ID: 1775932
OSTI ID: 23203368
Journal Information:
Journal of Computational Physics, Journal Name: Journal of Computational Physics Journal Issue: C Vol. 432; ISSN 0021-9991
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (10)

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring preprint January 2012
On the limited memory BFGS method for large scale optimization journal August 1989
Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media journal January 2020
Langevin diffusions and the Metropolis-adjusted Langevin algorithm journal August 2014
A mixed multiscale finite element method for elliptic problems with oscillating coefficients journal June 2002
Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods journal March 2011
A Stochastic Quasi-Newton Method for Large-Scale Optimization journal January 2016
Mixed Generalized Multiscale Finite Element Methods and Applications journal January 2015
Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization journal January 2017
A Stochastic Approximation Method journal September 1951

Similar Records

Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications
Journal Article · Tue Feb 02 19:00:00 EST 2021 · Journal of Computational Physics · OSTI ID:1853726

Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo
Journal Article · Sun Jan 03 19:00:00 EST 2021 · SIAM Journal on Scientific Computing · OSTI ID:1866812

Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification
Conference · Wed Aug 10 00:00:00 EDT 2016 · OSTI ID:1334875