An adaptive Hessian approximated stochastic gradient MCMC method

Wang, Yating; Deng, Wei; Lin, Guang

doi:10.1016/j.jcp.2021.110150

Title: An adaptive Hessian approximated stochastic gradient MCMC method

Abstract

Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. Here, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weightmore »« less

Authors:

Wang, Yating ^[1]; Deng, Wei ^[1]; Lin, Guang ^[2]

Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics
Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics. School of Mechanical Engineering. Dept. of Statistics. Dept. of Earth, Atmospheric, and Planetary Sciences

Publication Date:: Thu Feb 04 00:00:00 EST 2021

Research Org.:: Purdue Univ., West Lafayette, IN (United States)

Sponsoring Org.:: National Science Foundation (NSF); US Army Research Office (ARO); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

OSTI Identifier:: 1853727

Alternate Identifier(s):: OSTI ID: 1775932

Grant/Contract Number:: SC0021142; DMS-1555072; DMS-1736364; CMMI-1634832; CMMI-1560834; W911NF-15-1-0562

Resource Type:: Accepted Manuscript

Journal Name:: Journal of Computational Physics

Additional Journal Information:: Journal Volume: 432; Journal Issue: C; Journal ID: ISSN 0021-9991

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; adaptive Bayesian method; deep learning; Hessian approximated stochastic gradient MCMC; stochastic approximation; limited memory BFGS; highly correlated density

Citation Formats


                    Wang, Yating, Deng, Wei, and Lin, Guang. An adaptive Hessian approximated stochastic gradient MCMC method.  United States: N. p., 2021. 
Web.  doi:10.1016/j.jcp.2021.110150.

Copy to clipboard


                    Wang, Yating, Deng, Wei, & Lin, Guang. An adaptive Hessian approximated stochastic gradient MCMC method.  United States.  https://doi.org/10.1016/j.jcp.2021.110150

Copy to clipboard


                    Wang, Yating, Deng, Wei, and Lin, Guang. Thu .  
"An adaptive Hessian approximated stochastic gradient MCMC method".  United States.  https://doi.org/10.1016/j.jcp.2021.110150.  https://www.osti.gov/servlets/purl/1853727.

Copy to clipboard


                    
@article{osti_1853727,

  title        = {An adaptive Hessian approximated stochastic gradient MCMC method},

  author       = {Wang, Yating and Deng, Wei and Lin, Guang},

  abstractNote = {Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. Here, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.},

  doi          = {10.1016/j.jcp.2021.110150},

  journal      = {Journal of Computational Physics},

  number       = C,

  volume       = 432,

  place        = {United States},

  year         = {Thu Feb 04 00:00:00 EST 2021},

  month        = {Thu Feb 04 00:00:00 EST 2021}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.jcp.2021.110150

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

A mixed multiscale finite element method for elliptic problems with oscillating coefficients
journal, June 2002

Chen, Zhiming; Hou, Thomas Y.
Mathematics of Computation, Vol. 72, Issue 242
DOI: 10.1090/S0025-5718-02-01441-2

A Stochastic Quasi-Newton Method for Large-Scale Optimization
journal, January 2016

Byrd, R. H.; Hansen, S. L.; Nocedal, Jorge
SIAM Journal on Optimization, Vol. 26, Issue 2
DOI: 10.1137/140954362

Mixed Generalized Multiscale Finite Element Methods and Applications
journal, January 2015

Chung, Eric T.; Efendiev, Yalchin; Lee, Chak Shing
Multiscale Modeling & Simulation, Vol. 13, Issue 1
DOI: 10.1137/140970574

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods
journal, March 2011

Girolami, Mark; Calderhead, Ben
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 73, Issue 2
DOI: 10.1111/j.1467-9868.2010.00765.x

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization
journal, January 2017

Wang, Xiao; Ma, Shiqian; Goldfarb, Donald
SIAM Journal on Optimization, Vol. 27, Issue 2
DOI: 10.1137/15M1053141

Langevin diffusions and the Metropolis-adjusted Langevin algorithm
journal, August 2014

Xifara, T.; Sherlock, C.; Livingstone, S.
Statistics & Probability Letters, Vol. 91
DOI: 10.1016/j.spl.2014.04.002

Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media
journal, January 2020

Wang, Yating; Lin, Guang
Journal of Computational Physics, Vol. 401
DOI: 10.1016/j.jcp.2019.108968

A Stochastic Approximation Method
journal, September 1951

Robbins, Herbert; Monro, Sutton
The Annals of Mathematical Statistics, Vol. 22, Issue 3
DOI: 10.1214/aoms/1177729586

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
preprint, January 2012

Ahn, Sungjin; Korattikara, Anoop; Welling, Max
arXiv
DOI: 10.48550/arxiv.1206.6380

Similar Records in DOE PAGES and OSTI.GOV collections:

Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Journal Article Wang, Yating ; Deng, Wei ; Lin, Guang - Journal of Computational Physics

Deep neural networks have been successfully employed in an extensive variety of research areas, including solving partial differential equations. Despite its significant success, there are some challenges in effectively training DNN, such as avoiding overfitting in over-parameterized DNNs and accelerating the optimization in DNNs with pathological curvature. Here, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamicsmore »« less
https://doi.org/10.1016/j.jcp.2021.110134

Full Text Available
Flow-driven spectral chaos (FSC) method for simulating long-time dynamics of arbitrary-order non-linear stochastic dynamical systems

Journal Article Esquivel, Hugo ; Prakash, Arun ; Lin, Guang - Journal of Computational Physics

Uncertainty quantification techniques such as the time-dependent generalized polynomial chaos (TD-gPC) use an adaptive orthogonal basis to better represent the stochastic part of the solution space (aka random function space) in time. However, because the random function space is constructed using tensor products, TD-gPC-based methods are known to suffer from the curse of dimensionality. Here, we introduce a new numerical method called the flow-driven spectral chaos (FSC) which overcomes this curse of dimensionality at the random-function-space level. The proposed method is not only computationally more efficient than existing TD-gPC-based methods but is also far more accurate. The FSC method usesmore »« less
https://doi.org/10.1016/j.jcp.2020.110044

Full Text Available
Flow-driven spectral chaos (FSC) method for long-time integration of second-order stochastic dynamical systems

Journal Article Esquivel, Hugo ; Prakash, Arun ; Lin, Guang - Journal of Computational and Applied Mathematics

For decades, uncertainty quantification techniques based on the spectral approach have been demonstrated to be computationally more efficient than the Monte Carlo method for a wide variety of problems, particularly when the dimensionality of the probability space is relatively low. The time-dependent generalized polynomial chaos (TD-gPC) is one such technique that uses an evolving orthogonal basis to better represent the stochastic part of the solution space in time. Here in this paper, we present a new numerical method that uses the concept of enriched stochastic flow maps to track the evolution of the stochastic part of the solution space inmore »« less
https://doi.org/10.1016/j.cam.2021.113674

Full Text Available
Special Issue: Geostatistics and Machine Learning

Journal Article De Iaco, Sandra ; Hristopulos, Dionissios T. ; Lin, Guang - Mathematical Geosciences

Abstract Recent years have seen a steady growth in the number of papers that apply machine learning methods to problems in the earth sciences. Although they have different origins, machine learning and geostatistics share concepts and methods. For example, the kriging formalism can be cast in the machine learning framework of Gaussian process regression. Machine learning, with its focus on algorithms and ability to seek, identify, and exploit hidden structures in big data sets, is providing new tools for exploration and prediction in the earth sciences. Geostatistics, on the other hand, offers interpretable models of spatial (and spatiotemporal) dependence. Thismore »« less
https://doi.org/10.1007/s11004-022-09998-6
Feature Selection Techniques for a Machine Learning Model to Detect Autonomic Dysreflexia

Journal Article Suresh, Shruthi ; Newton, David T. ; Everett, Thomas H. ; ... - Frontiers in Neuroinformatics

Feature selection plays a crucial role in the development of machine learning algorithms. Understanding the impact of the features on a model, and their physiological relevance can improve the performance. This is particularly helpful in the healthcare domain wherein disease states need to be identified with relatively small quantities of data. Autonomic Dysreflexia (AD) is one such example, wherein mismanagement of this neurological condition could lead to severe consequences for individuals with spinal cord injuries. We explore different methods of feature selection needed to improve the performance of a machine learning model in the detection of the onset of AD.more »« less
https://doi.org/10.3389/fninf.2022.901428

Full Text Available

Similar Records

Title: An adaptive Hessian approximated stochastic gradient MCMC method

Abstract

Citation Formats

A mixed multiscale finite element method for elliptic problems with oscillating coefficients journal, June 2002

A Stochastic Quasi-Newton Method for Large-Scale Optimization journal, January 2016

Mixed Generalized Multiscale Finite Element Methods and Applications journal, January 2015

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods journal, March 2011

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization journal, January 2017

Langevin diffusions and the Metropolis-adjusted Langevin algorithm journal, August 2014

Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media journal, January 2020

A Stochastic Approximation Method journal, September 1951

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring preprint, January 2012

A mixed multiscale finite element method for elliptic problems with oscillating coefficients
journal, June 2002

A Stochastic Quasi-Newton Method for Large-Scale Optimization
journal, January 2016

Mixed Generalized Multiscale Finite Element Methods and Applications
journal, January 2015

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods
journal, March 2011

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization
journal, January 2017

Langevin diffusions and the Metropolis-adjusted Langevin algorithm
journal, August 2014

Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media
journal, January 2020

A Stochastic Approximation Method
journal, September 1951

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
preprint, January 2012