Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness
Abstract
We report the accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We alsomore »
- Authors:
-
- Chinese Academy of Sciences (CAS), Beijing (China); Univ. of Chinese Academy of Sciences, Beijing (China)
- Brown Univ., Providence, RI (United States)
- Publication Date:
- Research Org.:
- Brown Univ., Providence, RI (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); US Air Force Office of Scientific Research (AFOSR); Defense Advanced Research Projects Agency (DARPA); Minister of Science and Technology of China (MOST); National Natural Science Foundation of China (NSFC)
- OSTI Identifier:
- 1853302
- Alternate Identifier(s):
- OSTI ID: 1637558; OSTI ID: 2281730
- Grant/Contract Number:
- SC0019453; FA9550-17-1-0013; HR00111990025; 2018AAA0101002; 11771438
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Neural Networks
- Additional Journal Information:
- Journal Volume: 130; Journal Issue: C; Journal ID: ISSN 0893-6080
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; neural networks; generalization error; learnability; data distribution; cover complexity; neural network smoothness
Citation Formats
Jin, Pengzhan, Lu, Lu, Tang, Yifa, and Karniadakis, George Em. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. United States: N. p., 2020.
Web. doi:10.1016/j.neunet.2020.06.024.
Jin, Pengzhan, Lu, Lu, Tang, Yifa, & Karniadakis, George Em. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. United States. https://doi.org/10.1016/j.neunet.2020.06.024
Jin, Pengzhan, Lu, Lu, Tang, Yifa, and Karniadakis, George Em. Fri .
"Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness". United States. https://doi.org/10.1016/j.neunet.2020.06.024. https://www.osti.gov/servlets/purl/1853302.
@article{osti_1853302,
title = {Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness},
author = {Jin, Pengzhan and Lu, Lu and Tang, Yifa and Karniadakis, George Em},
abstractNote = {We report the accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We also observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size.},
doi = {10.1016/j.neunet.2020.06.024},
journal = {Neural Networks},
number = C,
volume = 130,
place = {United States},
year = {Fri Jul 03 00:00:00 EDT 2020},
month = {Fri Jul 03 00:00:00 EDT 2020}
}
Works referenced in this record:
Robust Large Margin Deep Neural Networks
journal, August 2017
- Sokolic, Jure; Giryes, Raja; Sapiro, Guillermo
- IEEE Transactions on Signal Processing, Vol. 65, Issue 16
Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979
- Dudley, R. M.
- Advances in Mathematics, Vol. 31, Issue 3
Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
journal, January 2018
- Cheng, Yu; Wang, Duo; Zhou, Pan
- IEEE Signal Processing Magazine, Vol. 35, Issue 1
Multilayer feedforward networks are universal approximators
journal, January 1989
- Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert
- Neural Networks, Vol. 2, Issue 5
On the information bottleneck theory of deep learning
journal, December 2019
- Saxe, Andrew M.; Bansal, Yamini; Dapello, Joel
- Journal of Statistical Mechanics: Theory and Experiment, Vol. 2019, Issue 12
Approximation by superpositions of a sigmoidal function
journal, December 1989
- Cybenko, G.
- Mathematics of Control, Signals, and Systems, Vol. 2, Issue 4
Gradient-based learning applied to document recognition
journal, January 1998
- Lecun, Y.; Bottou, L.; Bengio, Y.
- Proceedings of the IEEE, Vol. 86, Issue 11
Mastering the game of Go with deep neural networks and tree search
journal, January 2016
- Silver, David; Huang, Aja; Maddison, Chris J.
- Nature, Vol. 529, Issue 7587
Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979
- Dudley, R. M.
- Advances in Mathematics, Vol. 31, Issue 3
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
preprint, January 2014
- Neyshabur, Behnam; Tomioka, Ryota; Srebro, Nathan
- arXiv
Path-SGD: Path-Normalized Optimization in Deep Neural Networks
preprint, January 2015
- Neyshabur, Behnam; Salakhutdinov, Ruslan; Srebro, Nathan
- arXiv
Gradient Descent Converges to Minimizers
preprint, January 2016
- Lee, Jason D.; Simchowitz, Max; Jordan, Michael I.
- arXiv
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
preprint, January 2016
- Keskar, Nitish Shirish; Mudigere, Dheevatsa; Nocedal, Jorge
- arXiv
Generalization Error of Invariant Classifiers
text, January 2016
- Sokolic, Jure; Giryes, Raja; Sapiro, Guillermo
- arXiv
Data-Dependent Stability of Stochastic Gradient Descent
preprint, January 2017
- Kuzborskij, Ilja; Lampert, Christoph H.
- arXiv
Sharp Minima Can Generalize For Deep Nets
preprint, January 2017
- Dinh, Laurent; Pascanu, Razvan; Bengio, Samy
- arXiv
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
text, January 2017
- Dziugaite, Gintare Karolina; Roy, Daniel M.
- arXiv
Exploring Generalization in Deep Learning
preprint, January 2017
- Neyshabur, Behnam; Bhojanapalli, Srinadh; McAllester, David
- arXiv
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
preprint, January 2017
- Neyshabur, Behnam; Bhojanapalli, Srinadh; Srebro, Nathan
- arXiv
Generalization in Deep Learning
text, January 2017
- Kawaguchi, Kenji; Kaelbling, Leslie Pack; Bengio, Yoshua
- arXiv
Stronger generalization bounds for deep nets via a compression approach
preprint, January 2018
- Arora, Sanjeev; Ge, Rong; Neyshabur, Behnam
- arXiv
Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
preprint, January 2018
- Baykal, Cenk; Liebenwein, Lucas; Gilitschenski, Igor
- arXiv
On the Spectral Bias of Neural Networks
text, January 2018
- Rahaman, Nasim; Baratin, Aristide; Arpit, Devansh
- arXiv