Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Improving Deep Neural Networks’ Training for Image Classification With Nonlinear Conjugate Gradient-Style Adaptive Momentum

Journal Article · · IEEE Transactions on Neural Networks and Learning Systems
 [1];  [2]
  1. University of Utah, Salt Lake City, UT (United States); Department of Mathematics
  2. University of Kentucky, Lexington, KY (United States)

Momentum is crucial in stochastic gradient-based optimization algorithms for accelerating or improving training deep neural networks (DNNs). In deep learning practice, the momentum is usually weighted by a well-calibrated constant. However, tuning the hyperparameter for momentum can be a significant computational burden. In this article, we propose a novel adaptive momentum for improving DNNs training; this adaptive momentum, with no momentum-related hyperparame- ter required, is motivated by the nonlinear conjugate gradient (NCG) method. Stochastic gradient descent (SGD) with this new adaptive momentum eliminates the need for the momentum hyperparameter calibration, allows using a significantly larger learning rate, accelerates DNN training, and improves the final accuracy and robustness of the trained DNNs. For example, SGD with this adaptive momentum reduces classification errors for training ResNet110 for CIFAR10 and CIFAR100 from 5.25% to 4.64% and 23.75% to 20.03%, respectively. Furthermore, SGD, with the new adaptive momentum, also benefits adversarial training and, hence, improves the adversarial robustness of the trained DNNs.

Research Organization:
University of Utah, Salt Lake City, UT (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
Grant/Contract Number:
SC0021142; SC0023490
OSTI ID:
2280651
Journal Information:
IEEE Transactions on Neural Networks and Learning Systems, Journal Name: IEEE Transactions on Neural Networks and Learning Systems Journal Issue: 9 Vol. 35; ISSN 2162-237X
Publisher:
IEEE Computational Intelligence SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (27)

Identity Mappings in Deep Residual Networks book January 2016
Some convergence properties of the conjugate gradient method journal December 1976
First-order methods of smooth convex optimization with inexact oracle journal June 2013
Adaptive Restart for Accelerated Gradient Schemes journal July 2013
ImageNet Large Scale Visual Recognition Challenge journal April 2015
Some methods of speeding up the convergence of iteration methods journal January 1964
Optimal methods of smooth convex minimization journal January 1985
A scaled conjugate gradient algorithm for fast supervised learning journal January 1993
Deep learning journal May 2015
Note sur la convergence de méthodes de directions conjuguées journal January 1969
Analysis of the finite precision bi-conjugate gradient algorithm for nonsymmetric linear systems journal August 1999
Function minimization by conjugate gradients journal February 1964
Descent Property and Global Convergence of the Fletcher—Reeves Method with Inexact Line Search journal January 1985
Monotonicity and restart in fast gradient methods conference December 2014
Deep Residual Learning for Image Recognition conference June 2016
Advances in optimizing recurrent networks conference May 2013
CAD: Scale Invariant Framework for Real-Time Object Detection conference October 2017
Inception Single Shot MultiBox Detector for object detection conference July 2017
Towards Evaluating the Robustness of Neural Networks conference May 2017
A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search journal January 2005
Global Convergence Properties of Conjugate Gradient Methods for Optimization journal February 1992
Iterative Methods for Sparse Linear Systems book January 2003
Optimization Methods for Large-Scale Machine Learning journal January 2018
On the Generalized Lanczos Trust-Region Method journal January 2017
Sharpness, Restart, and Acceleration journal January 2020
Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration journal January 1999
A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property journal January 1999

Similar Records

XploreNAS: Explore Adversarially Robust and Hardware-efficient Neural Architectures for Non-ideal Xbars
Journal Article · Mon Jul 24 00:00:00 EDT 2023 · ACM Transactions on Embedded Computing Systems · OSTI ID:2422212

Scalable Second Order Optimization for Machine Learning
Technical Report · Sun May 01 00:00:00 EDT 2022 · OSTI ID:1984057

Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications
Journal Article · Tue Feb 02 23:00:00 EST 2021 · Journal of Computational Physics · OSTI ID:1853726