Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Wang, Yating; Deng, Wei; Lin, Guang

doi:10.1016/j.jcp.2021.110134

Title: Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Abstract

Deep neural networks have been successfully employed in an extensive variety of research areas, including solving partial differential equations. Despite its significant success, there are some challenges in effectively training DNN, such as avoiding overfitting in over-parameterized DNNs and accelerating the optimization in DNNs with pathological curvature. Here, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD), and optimize the latent variables via stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in nonconvex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply a computationally manageable preconditioner in the updating rule, which provides a step-size parameter to adapt to local geometric properties. Moreover, by smoothly optimizing the hyperparameter in the preconditioning matrix,more »« less

Authors:

Wang, Yating ^[1]; Deng, Wei ^[1]; Lin, Guang ^[2]

Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics
Purdue Univ., West Lafayette, IN (United States). Dept. of Mathematics. School of Mechanical Engineering. Dept. of Statistics. Dept. of Earth, Atmospheric, and Planetary Sciences

Publication Date:: Wed Feb 03 00:00:00 EST 2021

Research Org.:: Purdue Univ., West Lafayette, IN (United States)

Sponsoring Org.:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); US Army Research Office (ARO)

OSTI Identifier:: 1853726

Alternate Identifier(s):: OSTI ID: 1809418

Grant/Contract Number:: SC0021142; DMS-1555072; DMS-1736364; CMMI-1634832; CMMI-1560834; W911NF-15-1-0562

Resource Type:: Accepted Manuscript

Journal Name:: Journal of Computational Physics

Additional Journal Information:: Journal Volume: 432; Journal Issue: C; Journal ID: ISSN 0021-9991

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS; computer science; physics; Bayesian sparse learning; preconditioned stochastic gradient MCMC; deep learning; deep neural network; adaptive hierarchical posterior; stochastic approximation

Citation Formats


                    Wang, Yating, Deng, Wei, and Lin, Guang. Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications.  United States: N. p., 2021. 
Web.  doi:10.1016/j.jcp.2021.110134.

Copy to clipboard


                    Wang, Yating, Deng, Wei, & Lin, Guang. Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications.  United States.  https://doi.org/10.1016/j.jcp.2021.110134

Copy to clipboard


                    Wang, Yating, Deng, Wei, and Lin, Guang. Wed .  
"Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications".  United States.  https://doi.org/10.1016/j.jcp.2021.110134.  https://www.osti.gov/servlets/purl/1853726.

Copy to clipboard


                    
@article{osti_1853726,

  title        = {Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications},

  author       = {Wang, Yating and Deng, Wei and Lin, Guang},

  abstractNote = {Deep neural networks have been successfully employed in an extensive variety of research areas, including solving partial differential equations. Despite its significant success, there are some challenges in effectively training DNN, such as avoiding overfitting in over-parameterized DNNs and accelerating the optimization in DNNs with pathological curvature. Here, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD), and optimize the latent variables via stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in nonconvex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply a computationally manageable preconditioner in the updating rule, which provides a step-size parameter to adapt to local geometric properties. Moreover, by smoothly optimizing the hyperparameter in the preconditioning matrix, our proposed algorithm ensures a decreasing bias, which is introduced by ignoring the correction term in the preconditioned SGLD. According to the existing theoretical framework, we show that the proposed algorithm can asymptotically converge to the correct distribution with a controllable bias under mild conditions. Numerical tests are performed on both synthetic regression problems and learning solutions of elliptic PDE, which demonstrate the accuracy and efficiency of the present work.},

  doi          = {10.1016/j.jcp.2021.110134},

  journal      = {Journal of Computational Physics},

  number       = C,

  volume       = 432,

  place        = {United States},

  year         = {Wed Feb 03 00:00:00 EST 2021},

  month        = {Wed Feb 03 00:00:00 EST 2021}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.jcp.2021.110134

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

A Stochastic Quasi-Newton Method for Large-Scale Optimization
journal, January 2016

Byrd, R. H.; Hansen, S. L.; Nocedal, Jorge
SIAM Journal on Optimization, Vol. 26, Issue 2
DOI: 10.1137/140954362

EMVS: The EM Approach to Bayesian Variable Selection
journal, April 2014

Ročková, Veronika; George, Edward I.
Journal of the American Statistical Association, Vol. 109, Issue 506
DOI: 10.1080/01621459.2013.869223

Online adaptive local multiscale model reduction for heterogeneous problems in perforated domains
journal, July 2016

Chung, Eric T.; Efendiev, Yalchin; Leung, Wing Tat
Applicable Analysis, Vol. 96, Issue 12
DOI: 10.1080/00036811.2016.1199799

Generalized multiscale finite element methods (GMsFEM)
journal, October 2013

Efendiev, Yalchin; Galvis, Juan; Hou, Thomas Y.
Journal of Computational Physics, Vol. 251
DOI: 10.1016/j.jcp.2013.04.045

Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data
journal, October 2019

Zhu, Yinhao; Zabaras, Nicholas; Koutsourelakis, Phaedon-Stelios
Journal of Computational Physics, Vol. 394
DOI: 10.1016/j.jcp.2019.05.024

Training‐Image Based Geostatistical Inversion Using a Spatial Generative Adversarial Neural Network
journal, January 2018

Laloy, Eric; Hérault, Romain; Jacques, Diederik
Water Resources Research, Vol. 54, Issue 1
DOI: 10.1002/2017WR022148

Deep multiscale model learning
journal, April 2020

Wang, Yating; Cheung, Siu Wun; Chung, Eric T.
Journal of Computational Physics, Vol. 406
DOI: 10.1016/j.jcp.2019.109071

Mixed Generalized Multiscale Finite Element Methods and Applications
journal, January 2015

Chung, Eric T.; Efendiev, Yalchin; Lee, Chak Shing
Multiscale Modeling & Simulation, Vol. 13, Issue 1
DOI: 10.1137/140970574

Reduced-order deep learning for flow dynamics. The interplay between deep learning and model reduction
journal, January 2020

Wang, Min; Cheung, Siu Wun; Leung, Wing Tat
Journal of Computational Physics, Vol. 401
DOI: 10.1016/j.jcp.2019.108939

A Multiscale Neural Network Based on Hierarchical Matrices
journal, January 2019

Fan, Yuwei; Lin, Lin; Ying, Lexing
Multiscale Modeling & Simulation, Vol. 17, Issue 4
DOI: 10.1137/18M1203602

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods
journal, March 2011

Girolami, Mark; Calderhead, Ben
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 73, Issue 2
DOI: 10.1111/j.1467-9868.2010.00765.x

Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems
journal, November 2019

Zhang, Dongkun; Lu, Lu; Guo, Ling
Journal of Computational Physics, Vol. 397
DOI: 10.1016/j.jcp.2019.07.048

MgNet: A unified framework of multigrid and convolutional neural network
journal, May 2019

He, Juncai; Xu, Jinchao
Science China Mathematics, Vol. 62, Issue 7
DOI: 10.1007/s11425-019-9547-2

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems
journal, February 2018

E., Weinan; Yu, Bing
Communications in Mathematics and Statistics, Vol. 6, Issue 1
DOI: 10.1007/s40304-018-0127-z

Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media
journal, January 2020

Wang, Yating; Lin, Guang
Journal of Computational Physics, Vol. 401
DOI: 10.1016/j.jcp.2019.108968

Homogenization-Based Mixed Multiscale Finite Elements for Problems with Anisotropy
journal, April 2011

Arbogast, Todd
Multiscale Modeling & Simulation, Vol. 9, Issue 2
DOI: 10.1137/100788677

Deep global model reduction learning in porous media flow simulation
journal, December 2019

Cheung, Siu Wun; Chung, Eric T.; Efendiev, Yalchin
Computational Geosciences, Vol. 24, Issue 1
DOI: 10.1007/s10596-019-09918-4

Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification
journal, August 2018

Zhu, Yinhao; Zabaras, Nicholas
Journal of Computational Physics, Vol. 366
DOI: 10.1016/j.jcp.2018.04.018

Mixed Multiscale Finite Element Methods for Stochastic Porous Media Flows
journal, January 2008

Aarnes, J. E.; Efendiev, Y.
SIAM Journal on Scientific Computing, Vol. 30, Issue 5
DOI: 10.1137/07070108x

Pruning Convolutional Neural Networks for Resource Efficient Inference
preprint, January 2016

Molchanov, Pavlo; Tyree, Stephen; Karras, Tero
arXiv
DOI: 10.48550/arxiv.1611.06440

Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data
text, January 2019

Zhu, Yinhao; Zabaras, Nicholas; Koutsourelakis, Phaedon-Stelios
arXiv
DOI: 10.48550/arxiv.1901.06314

Similar Records in DOE PAGES and OSTI.GOV collections:

An adaptive Hessian approximated stochastic gradient MCMC method

Journal Article Wang, Yating ; Deng, Wei ; Lin, Guang - Journal of Computational Physics

Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. Here, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is tomore »« less
https://doi.org/10.1016/j.jcp.2021.110150

Full Text Available
Feature Selection Techniques for a Machine Learning Model to Detect Autonomic Dysreflexia

Journal Article Suresh, Shruthi ; Newton, David T. ; Everett, Thomas H. ; ... - Frontiers in Neuroinformatics

Feature selection plays a crucial role in the development of machine learning algorithms. Understanding the impact of the features on a model, and their physiological relevance can improve the performance. This is particularly helpful in the healthcare domain wherein disease states need to be identified with relatively small quantities of data. Autonomic Dysreflexia (AD) is one such example, wherein mismanagement of this neurological condition could lead to severe consequences for individuals with spinal cord injuries. We explore different methods of feature selection needed to improve the performance of a machine learning model in the detection of the onset of AD.more »« less
https://doi.org/10.3389/fninf.2022.901428

Full Text Available
Multi-variance replica exchange SGMCMC for inverse and forward problems via Bayesian PINN

Journal Article Lin, Guang ; Wang, Yating ; Zhang, Zecheng - Journal of Computational Physics

Physics-informed neural network (PINN) has been successfully applied in solving a variety of nonlinear non-convex forward and inverse problems. However, the training is challenging because of the non-convex loss functions and the multiple optima in the Bayesian inverse problem. In this work, we propose a multi-variance replica exchange stochastic gradient Langevin dynamics method to tackle the challenge of the multiple local optima in the optimization and the challenge of the multiple modal posterior distribution in the inverse problem. Replica exchange methods are capable of escaping from the local traps and accelerating the convergence; two chains with different temperatures are designedmore »« less
https://doi.org/10.1016/j.jcp.2022.111173

Full Text Available
Flow-driven spectral chaos (FSC) method for simulating long-time dynamics of arbitrary-order non-linear stochastic dynamical systems

Journal Article Esquivel, Hugo ; Prakash, Arun ; Lin, Guang - Journal of Computational Physics

Uncertainty quantification techniques such as the time-dependent generalized polynomial chaos (TD-gPC) use an adaptive orthogonal basis to better represent the stochastic part of the solution space (aka random function space) in time. However, because the random function space is constructed using tensor products, TD-gPC-based methods are known to suffer from the curse of dimensionality. Here, we introduce a new numerical method called the flow-driven spectral chaos (FSC) which overcomes this curse of dimensionality at the random-function-space level. The proposed method is not only computationally more efficient than existing TD-gPC-based methods but is also far more accurate. The FSC method usesmore »« less
https://doi.org/10.1016/j.jcp.2020.110044

Full Text Available
SubTSBR to tackle high noise and outliers for data-driven discovery of differential equations

Journal Article Zhang, Sheng ; Lin, Guang - Journal of Computational Physics

Data-driven discovery of differential equations has been an emerging research topic. We propose a novel algorithm subsampling-based threshold sparse Bayesian regression (SubTSBR) to tackle high noise and outliers. The subsampling technique is used for improving the accuracy of the Bayesian learning algorithm. It has two parameters: subsampling size and the number of subsamples. When the subsampling size increases with fixed total sample size, the accuracy of our algorithm goes up and then down. When the number of subsamples increases, the accuracy of our algorithm keeps going up. We demonstrate how to use our algorithm step by step and compare ourmore »« less
https://doi.org/10.1016/j.jcp.2020.109962

Full Text Available

Similar Records

Title: Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Abstract

Citation Formats

A Stochastic Quasi-Newton Method for Large-Scale Optimization journal, January 2016

EMVS: The EM Approach to Bayesian Variable Selection journal, April 2014

Online adaptive local multiscale model reduction for heterogeneous problems in perforated domains journal, July 2016

Generalized multiscale finite element methods (GMsFEM) journal, October 2013

Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data journal, October 2019

Training‐Image Based Geostatistical Inversion Using a Spatial Generative Adversarial Neural Network journal, January 2018

Deep multiscale model learning journal, April 2020

Mixed Generalized Multiscale Finite Element Methods and Applications journal, January 2015

Reduced-order deep learning for flow dynamics. The interplay between deep learning and model reduction journal, January 2020

A Multiscale Neural Network Based on Hierarchical Matrices journal, January 2019

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods journal, March 2011

Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems journal, November 2019

MgNet: A unified framework of multigrid and convolutional neural network journal, May 2019

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems journal, February 2018

Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media journal, January 2020

Homogenization-Based Mixed Multiscale Finite Elements for Problems with Anisotropy journal, April 2011

Deep global model reduction learning in porous media flow simulation journal, December 2019

Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification journal, August 2018

Mixed Multiscale Finite Element Methods for Stochastic Porous Media Flows journal, January 2008

Pruning Convolutional Neural Networks for Resource Efficient Inference preprint, January 2016

Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data text, January 2019

A Stochastic Quasi-Newton Method for Large-Scale Optimization
journal, January 2016

EMVS: The EM Approach to Bayesian Variable Selection
journal, April 2014

Online adaptive local multiscale model reduction for heterogeneous problems in perforated domains
journal, July 2016

Generalized multiscale finite element methods (GMsFEM)
journal, October 2013

Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data
journal, October 2019

Training‐Image Based Geostatistical Inversion Using a Spatial Generative Adversarial Neural Network
journal, January 2018

Deep multiscale model learning
journal, April 2020

Mixed Generalized Multiscale Finite Element Methods and Applications
journal, January 2015

Reduced-order deep learning for flow dynamics. The interplay between deep learning and model reduction
journal, January 2020

A Multiscale Neural Network Based on Hierarchical Matrices
journal, January 2019

Riemann manifold Langevin and Hamiltonian Monte Carlo methods: Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods
journal, March 2011

Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems
journal, November 2019

MgNet: A unified framework of multigrid and convolutional neural network
journal, May 2019

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems
journal, February 2018

Efficient deep learning techniques for multiphase flow simulation in heterogeneous porousc media
journal, January 2020

Homogenization-Based Mixed Multiscale Finite Elements for Problems with Anisotropy
journal, April 2011

Deep global model reduction learning in porous media flow simulation
journal, December 2019

Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification
journal, August 2018

Mixed Multiscale Finite Element Methods for Stochastic Porous Media Flows
journal, January 2008

Pruning Convolutional Neural Networks for Resource Efficient Inference
preprint, January 2016

Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data
text, January 2019