Accelerating gradient descent and Adam via fractional gradients
- Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of); Brown University
- Brown Univ., Providence, RI (United States)
Here we propose a class of novel fractional-order optimization algorithms. We define a fractional-order gradient via the Caputo fractional derivatives that generalizes integer-order gradient. We refer it to as the Caputo fractional-based gradient, and develop an efficient implementation to compute it. A general class of fractional-order optimization methods is then obtained by replacing integer-order gradients with the Caputo fractional-based gradients. To give concrete algorithms, we consider gradient descent (GD) and Adam, and extend them to the Caputo fractional GD (CfGD) and the Caputo fractional Adam (CfAdam). We demonstrate the superiority of CfGD and CfAdam on several large scale optimization problems that arise from scientific machine learning applications, such as ill-conditioned least squares problem on real-world data and the training of neural networks involving non-convex objective functions. Numerical examples show that both CfGD and CfAdam result in acceleration over GD and Adam, respectively. We also derive error bounds of CfGD for quadratic functions, which further indicate that CfGD could mitigate the dependence on the condition number in the rate of convergence and results in significant acceleration over GD.
- Research Organization:
- Brown Univ., Providence, RI (United States)
- Sponsoring Organization:
- USDOE; US Army Research Office (ARO); US Air Force Office of Scientific Research (AFOSR)
- Grant/Contract Number:
- SC0019453
- OSTI ID:
- 2282013
- Journal Information:
- Neural Networks, Journal Name: Neural Networks Vol. 161; ISSN 0893-6080
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems