DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness

Abstract

We report the accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We alsomore » observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size.« less

Authors:
 [1]; ORCiD logo [2];  [1]; ORCiD logo [2]
  1. Chinese Academy of Sciences (CAS), Beijing (China); Univ. of Chinese Academy of Sciences, Beijing (China)
  2. Brown Univ., Providence, RI (United States)
Publication Date:
Research Org.:
Brown Univ., Providence, RI (United States)
Sponsoring Org.:
USDOE Office of Science (SC); US Air Force Office of Scientific Research (AFOSR); Defense Advanced Research Projects Agency (DARPA); Minister of Science and Technology of China (MOST); National Natural Science Foundation of China (NSFC)
OSTI Identifier:
1853302
Alternate Identifier(s):
OSTI ID: 1637558; OSTI ID: 2281730
Grant/Contract Number:  
SC0019453; FA9550-17-1-0013; HR00111990025; 2018AAA0101002; 11771438
Resource Type:
Accepted Manuscript
Journal Name:
Neural Networks
Additional Journal Information:
Journal Volume: 130; Journal Issue: C; Journal ID: ISSN 0893-6080
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; neural networks; generalization error; learnability; data distribution; cover complexity; neural network smoothness

Citation Formats

Jin, Pengzhan, Lu, Lu, Tang, Yifa, and Karniadakis, George Em. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. United States: N. p., 2020. Web. doi:10.1016/j.neunet.2020.06.024.
Jin, Pengzhan, Lu, Lu, Tang, Yifa, & Karniadakis, George Em. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. United States. https://doi.org/10.1016/j.neunet.2020.06.024
Jin, Pengzhan, Lu, Lu, Tang, Yifa, and Karniadakis, George Em. Fri . "Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness". United States. https://doi.org/10.1016/j.neunet.2020.06.024. https://www.osti.gov/servlets/purl/1853302.
@article{osti_1853302,
title = {Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness},
author = {Jin, Pengzhan and Lu, Lu and Tang, Yifa and Karniadakis, George Em},
abstractNote = {We report the accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We also observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size.},
doi = {10.1016/j.neunet.2020.06.024},
journal = {Neural Networks},
number = C,
volume = 130,
place = {United States},
year = {Fri Jul 03 00:00:00 EDT 2020},
month = {Fri Jul 03 00:00:00 EDT 2020}
}

Works referenced in this record:

Robust Large Margin Deep Neural Networks
journal, August 2017

  • Sokolic, Jure; Giryes, Raja; Sapiro, Guillermo
  • IEEE Transactions on Signal Processing, Vol. 65, Issue 16
  • DOI: 10.1109/TSP.2017.2708039

Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979


Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
journal, January 2018


Multilayer feedforward networks are universal approximators
journal, January 1989


On the information bottleneck theory of deep learning
journal, December 2019

  • Saxe, Andrew M.; Bansal, Yamini; Dapello, Joel
  • Journal of Statistical Mechanics: Theory and Experiment, Vol. 2019, Issue 12
  • DOI: 10.1088/1742-5468/ab3985

Approximation by superpositions of a sigmoidal function
journal, December 1989

  • Cybenko, G.
  • Mathematics of Control, Signals, and Systems, Vol. 2, Issue 4
  • DOI: 10.1007/BF02551274

Gradient-based learning applied to document recognition
journal, January 1998

  • Lecun, Y.; Bottou, L.; Bengio, Y.
  • Proceedings of the IEEE, Vol. 86, Issue 11
  • DOI: 10.1109/5.726791

Mastering the game of Go with deep neural networks and tree search
journal, January 2016

  • Silver, David; Huang, Aja; Maddison, Chris J.
  • Nature, Vol. 529, Issue 7587
  • DOI: 10.1038/nature16961

Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979


In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
preprint, January 2014


Path-SGD: Path-Normalized Optimization in Deep Neural Networks
preprint, January 2015


Gradient Descent Converges to Minimizers
preprint, January 2016


On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
preprint, January 2016


Generalization Error of Invariant Classifiers
text, January 2016


Data-Dependent Stability of Stochastic Gradient Descent
preprint, January 2017


Sharp Minima Can Generalize For Deep Nets
preprint, January 2017


Exploring Generalization in Deep Learning
preprint, January 2017


A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
preprint, January 2017


Generalization in Deep Learning
text, January 2017


Stronger generalization bounds for deep nets via a compression approach
preprint, January 2018


Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
preprint, January 2018


On the Spectral Bias of Neural Networks
text, January 2018