DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An adaptive Hessian approximated stochastic gradient MCMC method

Abstract

Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. Here, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weightmore » pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.« less

Authors:
 [1];  [1];  [2]
  1. Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics
  2. Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics. School of Mechanical Engineering. Dept. of Statistics. Dept. of Earth, Atmospheric, and Planetary Sciences
Publication Date:
Research Org.:
Purdue Univ., West Lafayette, IN (United States)
Sponsoring Org.:
National Science Foundation (NSF); US Army Research Office (ARO); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1853727
Alternate Identifier(s):
OSTI ID: 1775932
Grant/Contract Number:  
SC0021142; DMS-1555072; DMS-1736364; CMMI-1634832; CMMI-1560834; W911NF-15-1-0562
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Computational Physics
Additional Journal Information:
Journal Volume: 432; Journal Issue: C; Journal ID: ISSN 0021-9991
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; adaptive Bayesian method; deep learning; Hessian approximated stochastic gradient MCMC; stochastic approximation; limited memory BFGS; highly correlated density

Citation Formats

Wang, Yating, Deng, Wei, and Lin, Guang. An adaptive Hessian approximated stochastic gradient MCMC method. United States: N. p., 2021. Web. doi:10.1016/j.jcp.2021.110150.
Wang, Yating, Deng, Wei, & Lin, Guang. An adaptive Hessian approximated stochastic gradient MCMC method. United States. https://doi.org/10.1016/j.jcp.2021.110150
Wang, Yating, Deng, Wei, and Lin, Guang. Thu . "An adaptive Hessian approximated stochastic gradient MCMC method". United States. https://doi.org/10.1016/j.jcp.2021.110150. https://www.osti.gov/servlets/purl/1853727.
@article{osti_1853727,
title = {An adaptive Hessian approximated stochastic gradient MCMC method},
author = {Wang, Yating and Deng, Wei and Lin, Guang},
abstractNote = {Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. Here, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.},
doi = {10.1016/j.jcp.2021.110150},
journal = {Journal of Computational Physics},
number = C,
volume = 432,
place = {United States},
year = {Thu Feb 04 00:00:00 EST 2021},
month = {Thu Feb 04 00:00:00 EST 2021}
}

Works referenced in this record:

A mixed multiscale finite element method for elliptic problems with oscillating coefficients
journal, June 2002


A Stochastic Quasi-Newton Method for Large-Scale Optimization
journal, January 2016

  • Byrd, R. H.; Hansen, S. L.; Nocedal, Jorge
  • SIAM Journal on Optimization, Vol. 26, Issue 2
  • DOI: 10.1137/140954362

Mixed Generalized Multiscale Finite Element Methods and Applications
journal, January 2015

  • Chung, Eric T.; Efendiev, Yalchin; Lee, Chak Shing
  • Multiscale Modeling & Simulation, Vol. 13, Issue 1
  • DOI: 10.1137/140970574

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods
journal, March 2011

  • Girolami, Mark; Calderhead, Ben
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 73, Issue 2
  • DOI: 10.1111/j.1467-9868.2010.00765.x

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization
journal, January 2017

  • Wang, Xiao; Ma, Shiqian; Goldfarb, Donald
  • SIAM Journal on Optimization, Vol. 27, Issue 2
  • DOI: 10.1137/15M1053141

Langevin diffusions and the Metropolis-adjusted Langevin algorithm
journal, August 2014


Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media
journal, January 2020


A Stochastic Approximation Method
journal, September 1951

  • Robbins, Herbert; Monro, Sutton
  • The Annals of Mathematical Statistics, Vol. 22, Issue 3
  • DOI: 10.1214/aoms/1177729586

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
preprint, January 2012