DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multilevel-in-width training for deep neural network regression

Journal Article · · Numerical Linear Algebra with Applications
DOI: https://doi.org/10.1002/nla.2501 · OSTI ID:1987608
 [1];  [1];  [1];  [2]
  1. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  2. Portland State Univ., OR (United States)

Abstract A common challenge in regression is that for many problems, the degrees of freedom required for a high‐quality solution also allows for overfitting. Regularization is a class of strategies that seek to restrict the range of possible solutions so as to discourage overfitting while still enabling good solutions, and different regularization strategies impose different types of restrictions. In this paper, we present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks, each of which has layers that are wider versions of the previous network's layers. We draw intuition and techniques from the field of Algebraic Multigrid (AMG), traditionally used for solving linear and nonlinear systems of equations, and specifically adapt the Full Approximation Scheme (FAS) for nonlinear systems of equations to the problem of deep learning. Training through V‐cycles then encourage the neural networks to build a hierarchical understanding of the problem. We refer to this approach as multilevel‐in‐width to distinguish from prior multilevel works which hierarchically alter the depth of neural networks. The resulting approach is a highly flexible framework that can be applied to a variety of layer types, which we demonstrate with both fully connected and convolutional layers. We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer, improving the generalize performance of the neural networks studied.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC52-07NA27344
OSTI ID:
1987608
Alternate ID(s):
OSTI ID: 1974473
Report Number(s):
LLNL-JRNL-827231; 1042116
Journal Information:
Numerical Linear Algebra with Applications, Vol. 30, Issue 5; ISSN 1070-5325
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (17)

A Multigrid Method for Efficiently Training Video Models conference June 2020
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs journal January 1998
U-Net: Convolutional Networks for Biomedical Image Segmentation
  • Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas
  • Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III https://doi.org/10.1007/978-3-319-24574-4_28
book November 2015
Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid conference September 2020
Globally Convergent Multilevel Training of Deep Residual Networks journal August 2022
A Multigrid Tutorial, Second Edition book January 2000
Aggregation-Based Algebraic Multigrid for Convection-Diffusion Equations journal January 2012
DOLFIN: Automated finite element computing journal April 2010
Multi-level adaptive solutions to boundary-value problems journal May 1977
Layer-Parallel Training of Deep Residual Neural Networks journal January 2020
Hyperparameter Optimization book May 2019
A sequential quadratic hamiltonian algorithm for training explicit RK neural networks journal May 2022
Training Artificial Neural Networks with Gradient and Coarse-Level Correction Schemes book January 2022
Multi-Level 3D CNN for Learning Multi-Scale Spatial Features conference June 2019
Multilevel minimization for deep residual networks journal August 2021
AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods journal January 2015
A multigrid approach to discretized optimization problems journal January 2000