skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

Journal Article · · SIAM Journal on Mathematics of Data Science
DOI:https://doi.org/10.1137/20m1359511· OSTI ID:1834344

Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, nonconvexity, data sparsity, and nontrivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss--Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to nonquadratic objective functions, most notably cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In our four numerical experiments from surrogate modeling, segmentation, and classification, GNvpro solves the optimization problem more efficiently than commonly used stochastic gradient descent (SGD) schemes. Finally, GNvpro finds solutions that generalize well, and in all but one example better than well-tuned SGD methods, to unseen data points.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
NA0003525; 2003941; DMS 1751636
OSTI ID:
1834344
Report Number(s):
SAND-2020-8481J; 689974
Journal Information:
SIAM Journal on Mathematics of Data Science, Vol. 3, Issue 4; ISSN 2577-0187
Publisher:
Society for Industrial and Applied Mathematics (SIAM)Copyright Statement
Country of Publication:
United States
Language:
English

References (34)

TensorFlow.jl: An Idiomatic Julia Front End for TensorFlow journal November 2018
Reaction-diffusion model for the growth of avascular tumor journal January 2002
Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification journal December 2018
Variable projections neural network training journal November 2006
Optimal Control for a Groundwater Pollution Ruled by a Convection–Diffusion–Reaction Problem journal October 2016
Stable architectures for deep neural networks journal December 2017
Multiscale Modeling of Chemical Vapor Deposition (CVD) Apparatus: Simulations and Approximations journal February 2013
Instabilities in spatially extended predator–prey systems: Spatio-temporal patterns in the neighborhood of Turing–Hopf bifurcations journal March 2007
A variable projection method for solving separable nonlinear least squares problems journal March 1975
Optimization Methods for Large-Scale Machine Learning journal January 2018
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations journal February 2019
Bifurcation and spatiotemporal patterns in a homogeneous diffusive predator–prey system journal March 2009
Robust Stochastic Approximation Approach to Stochastic Programming journal January 2009
Separable nonlinear least squares: the variable projection method and its applications journal February 2003
Solving high-dimensional partial differential equations using deep learning journal August 2018
The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate journal April 1973
Numerical Simulation of Groundwater Pollution Problems Based on Convection Diffusion Equation journal January 2017
A hybrid, non-split, stiff/RKC, solver for advection–diffusion–reaction equations and its application to low-Mach number combustion journal April 2019
An implicit shift bidiagonalization algorithm for ill-posed systems journal December 1994
A Numerical Approach to the Study of Spatial Pattern Formation in the Ligaments of Arcoid Bivalves journal May 2002
Variable projection for nonlinear least squares problems journal August 2012
Deep Neural Networks Motivated by Partial Differential Equations journal September 2019
Numerical methods for coupled super-resolution journal June 2006
Exact and inexact subsampled Newton methods for optimization journal April 2018
Resistivity modeling for arbitrarily shaped three‐dimensional structures journal April 1979
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups journal November 2012
A Stochastic Approximation Method journal September 1951
220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3 dataset January 2015
Approximation by superpositions of a sigmoidal function journal December 1989
The Sample Average Approximation Method for Stochastic Discrete Optimization journal January 2002
ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs conference August 2019
Exact and Inexact Subsampled Newton Methods for Optimization preprint January 2016
Inexact Newton Methods for Stochastic Nonconvex Optimization with Applications to Neural Network Training preprint January 2019
Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows preprint January 2020

Similar Records

Improving Deep Neural Networks’ Training for Image Classification With Nonlinear Conjugate Gradient-Style Adaptive Momentum
Journal Article · Fri Mar 24 00:00:00 EDT 2023 · IEEE Transactions on Neural Networks and Learning Systems · OSTI ID:1834344

Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications
Journal Article · Wed Feb 03 00:00:00 EST 2021 · Journal of Computational Physics · OSTI ID:1834344

Layer-Parallel Training of Deep Residual Neural Networks
Journal Article · Thu Feb 06 00:00:00 EST 2020 · SIAM Journal on Mathematics of Data Science · OSTI ID:1834344