Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A deterministic gradient-based approach to avoid saddle points

Journal Article · · European Journal of Applied Mathematics

Abstract

Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al.,arXiv:1806.06317], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is$$\lfloor (n-1)/2\rfloor$$, and hence it is significantly smaller than that of GD whose dimension is$n-1$.

Research Organization:
Hysitron, Inc., Minneapolis, MN (United States); Purdue Univ., West Lafayette, IN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
SC0002722; SC0021142
OSTI ID:
2419645
Journal Information:
European Journal of Applied Mathematics, Journal Name: European Journal of Applied Mathematics Journal Issue: 4 Vol. 34; ISSN 0956-7925
Publisher:
Cambridge University Press
Country of Publication:
United States
Language:
English

References (13)

A trust region algorithm with a worst-case iteration complexity of $$\mathcal{O}(\epsilon ^{-3/2})$$ O ( ϵ - 3 / 2 ) for nonconvex optimization journal May 2016
Learning representations by back-propagating errors journal October 1986
Exploiting negative curvature in deterministic and stochastic optimization journal October 2018
Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo journal January 2021
Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step journal January 2019
Cubic regularization of Newton method and its global performance journal April 2006
ImageNet Large Scale Visual Recognition Challenge journal April 2015
Learning Deep Architectures for AI journal January 2009
Finding approximate local minima faster than gradient descent conference January 2017
Deep Residual Learning for Image Recognition conference June 2016
A Geometric Analysis of Phase Retrieval journal August 2017
First-order methods almost always avoid strict saddle points journal February 2019
A Newton-Based Method for Nonconvex Optimization with Fast Evasion of Saddle Points journal January 2019

Similar Records

Towards fast and accurate predictions of radio frequency power deposition and current profile via data-driven modelling: applications to lower hybrid current drive
Journal Article · 2022 · Journal of Plasma Physics · OSTI ID:1882254

Large gyro-orbit model of ion velocity distribution in plasma near a wall in a grazing-angle magnetic field
Journal Article · 2021 · Journal of Plasma Physics · OSTI ID:1850057

Discovery of the most luminous quasar of the last 9 Gyr
Journal Article · 2021 · Publications of the Astronomical Society of Australia · OSTI ID:1981807

Related Subjects