Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness

Jin, Pengzhan; Lu, Lu; Tang, Yifa; Karniadakis, George Em

doi:10.1016/j.neunet.2020.06.024

Title: Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness

Abstract

We report the accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We alsomore »« less

Authors:

Jin, Pengzhan ^[1];

^[2]; Tang, Yifa ^[1];

^[2]

Chinese Academy of Sciences (CAS), Beijing (China); Univ. of Chinese Academy of Sciences, Beijing (China)
Brown Univ., Providence, RI (United States)

Publication Date:: Fri Jul 03 00:00:00 EDT 2020

Research Org.:: Brown Univ., Providence, RI (United States)

Sponsoring Org.:: USDOE Office of Science (SC); US Air Force Office of Scientific Research (AFOSR); Defense Advanced Research Projects Agency (DARPA); Minister of Science and Technology of China (MOST); National Natural Science Foundation of China (NSFC)

OSTI Identifier:: 1853302

Alternate Identifier(s):: OSTI ID: 1637558; OSTI ID: 2281730

Grant/Contract Number:: SC0019453; FA9550-17-1-0013; HR00111990025; 2018AAA0101002; 11771438

Resource Type:: Accepted Manuscript

Journal Name:: Neural Networks

Additional Journal Information:: Journal Volume: 130; Journal Issue: C; Journal ID: ISSN 0893-6080

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; neural networks; generalization error; learnability; data distribution; cover complexity; neural network smoothness

Citation Formats


                    Jin, Pengzhan, Lu, Lu, Tang, Yifa, and Karniadakis, George Em. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.  United States: N. p., 2020. 
Web.  doi:10.1016/j.neunet.2020.06.024.

Copy to clipboard


                    Jin, Pengzhan, Lu, Lu, Tang, Yifa, & Karniadakis, George Em. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.  United States.  https://doi.org/10.1016/j.neunet.2020.06.024

Copy to clipboard


                    Jin, Pengzhan, Lu, Lu, Tang, Yifa, and Karniadakis, George Em. Fri .  
"Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness".  United States.  https://doi.org/10.1016/j.neunet.2020.06.024.  https://www.osti.gov/servlets/purl/1853302.

Copy to clipboard


                    
@article{osti_1853302,

  title        = {Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness},

  author       = {Jin, Pengzhan and Lu, Lu and Tang, Yifa and Karniadakis, George Em},

  abstractNote = {We report the accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We also observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size.},

  doi          = {10.1016/j.neunet.2020.06.024},

  journal      = {Neural Networks},

  number       = C,

  volume       = 130,

  place        = {United States},

  year         = {Fri Jul 03 00:00:00 EDT 2020},

  month        = {Fri Jul 03 00:00:00 EDT 2020}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.neunet.2020.06.024

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Robust Large Margin Deep Neural Networks
journal, August 2017

Sokolic, Jure; Giryes, Raja; Sapiro, Guillermo
IEEE Transactions on Signal Processing, Vol. 65, Issue 16
DOI: 10.1109/TSP.2017.2708039

Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979

Dudley, R. M.
Advances in Mathematics, Vol. 31, Issue 3
DOI: 10.1016/0001-8708(79)90047-1

Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
journal, January 2018

Cheng, Yu; Wang, Duo; Zhou, Pan
IEEE Signal Processing Magazine, Vol. 35, Issue 1
DOI: 10.1109/MSP.2017.2765695

Multilayer feedforward networks are universal approximators
journal, January 1989

Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert
Neural Networks, Vol. 2, Issue 5
DOI: 10.1016/0893-6080(89)90020-8

On the information bottleneck theory of deep learning
journal, December 2019

Saxe, Andrew M.; Bansal, Yamini; Dapello, Joel
Journal of Statistical Mechanics: Theory and Experiment, Vol. 2019, Issue 12
DOI: 10.1088/1742-5468/ab3985

Approximation by superpositions of a sigmoidal function
journal, December 1989

Cybenko, G.
Mathematics of Control, Signals, and Systems, Vol. 2, Issue 4
DOI: 10.1007/BF02551274

Gradient-based learning applied to document recognition
journal, January 1998

Lecun, Y.; Bottou, L.; Bengio, Y.
Proceedings of the IEEE, Vol. 86, Issue 11
DOI: 10.1109/5.726791

Mastering the game of Go with deep neural networks and tree search
journal, January 2016

Silver, David; Huang, Aja; Maddison, Chris J.
Nature, Vol. 529, Issue 7587
DOI: 10.1038/nature16961

Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979

Dudley, R. M.
Advances in Mathematics, Vol. 31, Issue 3
DOI: 10.1016/0001-8708(79)90047-1

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
preprint, January 2014

Neyshabur, Behnam; Tomioka, Ryota; Srebro, Nathan
arXiv
DOI: 10.48550/arxiv.1412.6614

Path-SGD: Path-Normalized Optimization in Deep Neural Networks
preprint, January 2015

Neyshabur, Behnam; Salakhutdinov, Ruslan; Srebro, Nathan
arXiv
DOI: 10.48550/arxiv.1506.02617

Gradient Descent Converges to Minimizers
preprint, January 2016

Lee, Jason D.; Simchowitz, Max; Jordan, Michael I.
arXiv
DOI: 10.48550/arxiv.1602.04915

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
preprint, January 2016

Keskar, Nitish Shirish; Mudigere, Dheevatsa; Nocedal, Jorge
arXiv
DOI: 10.48550/arxiv.1609.04836

Generalization Error of Invariant Classifiers
text, January 2016

Sokolic, Jure; Giryes, Raja; Sapiro, Guillermo
arXiv
DOI: 10.48550/arxiv.1610.04574

Data-Dependent Stability of Stochastic Gradient Descent
preprint, January 2017

Kuzborskij, Ilja; Lampert, Christoph H.
arXiv
DOI: 10.48550/arxiv.1703.01678

Sharp Minima Can Generalize For Deep Nets
preprint, January 2017

Dinh, Laurent; Pascanu, Razvan; Bengio, Samy
arXiv
DOI: 10.48550/arxiv.1703.04933

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
text, January 2017

Dziugaite, Gintare Karolina; Roy, Daniel M.
arXiv
DOI: 10.48550/arxiv.1703.11008

Exploring Generalization in Deep Learning
preprint, January 2017

Neyshabur, Behnam; Bhojanapalli, Srinadh; McAllester, David
arXiv
DOI: 10.48550/arxiv.1706.08947

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
preprint, January 2017

Neyshabur, Behnam; Bhojanapalli, Srinadh; Srebro, Nathan
arXiv
DOI: 10.48550/arxiv.1707.09564

Generalization in Deep Learning
text, January 2017

Kawaguchi, Kenji; Kaelbling, Leslie Pack; Bengio, Yoshua
arXiv
DOI: 10.48550/arxiv.1710.05468

Stronger generalization bounds for deep nets via a compression approach
preprint, January 2018

Arora, Sanjeev; Ge, Rong; Neyshabur, Behnam
arXiv
DOI: 10.48550/arxiv.1802.05296

Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
preprint, January 2018

Baykal, Cenk; Liebenwein, Lucas; Gilitschenski, Igor
arXiv
DOI: 10.48550/arxiv.1804.05345

On the Spectral Bias of Neural Networks
text, January 2018

Rahaman, Nasim; Baratin, Aristide; Arpit, Devansh
arXiv
DOI: 10.48550/arxiv.1806.08734

Similar Records in DOE PAGES and OSTI.GOV collections:

Adaptive activation functions accelerate convergence in deep and physics-informed neural networks

Journal Article Jagtap, Ameya D. ; Kawaguchi, Kenji ; Karniadakis, George Em - Journal of Computational Physics

Here we employ adaptive activation functions for regression in deep and physics-informed neural networks (PINNs) to approximate smooth and discontinuous functions as well as solutions of linear and nonlinear partial differential equations. In particular, we solve the nonlinear Klein-Gordon equation, which has smooth solutions, the nonlinear Burgers equation, which can admit high gradient solutions, and the Helmholtz equation. We introduce a scalable hyper-parameter in the activation function, which can be optimized to achieve best performance of the network as it changes dynamically the topology of the loss function involved in the optimization process. The adaptive activation function has better learningmore »« less
Cited by 271
https://doi.org/10.1016/j.jcp.2019.109136

Full Text Available
NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations

Journal Article Jin, Xiaowei ; Cai, Shengze ; Li, Hui ; ... - Journal of Computational Physics

In the last 50 years there has been a tremendous progress in solving numerically the Navier-Stokes equations using finite differences, finite elements, spectral, and even meshless methods. Yet, in many real cases, we still cannot incorporate seamlessly (multi-fidelity) data into existing algorithms, and for industrial-complexity applications the mesh generation is time consuming and still an art. Moreover, solving ill-posed problems (e.g., lacking boundary conditions) or inverse problems is often prohibitively expensive and requires different formulations and new computer codes. Here, we employ physics-informed neural networks (PINNs), encoding the governing equations directly into the deep neural network via automatic differentiation, tomore »« less
https://doi.org/10.1016/j.jcp.2020.109951

Full Text Available
A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems

Journal Article Meng, Xuhui ; Karniadakis, George Em - Journal of Computational Physics

Presently the training of neural networks relies on data of comparable accuracy but in real applications only a very small set of high-fidelity data is available while inexpensive lower fidelity data may be plentiful. We propose a new composite neural network (NN) that can be trained based on multi-fidelity data. It is comprised of three NNs, with the first NN trained using the low-fidelity data and coupled to two high-fidelity NNs, one with activation functions and another one without, in order to discover and exploit nonlinear and linear correlations, respectively, between the low-fidelity and the high-fidelity data. We first demonstratemore »« less
Cited by 129
https://doi.org/10.1016/j.jcp.2019.109020

Full Text Available
Extended Physics-Informed Neural Networks (XPINNs): A Generalized Space-Time Domain Decomposition Based Deep Learning Framework for Nonlinear Partial Differential Equations

Journal Article Jagtap, Ameya D. ; Karniadakis, George Em - Communications in Computational Physics

Here we propose a generalized space-time domain decomposition approach for the physics-informed neural networks (PINNs) to solve nonlinear partial differential equations (PDEs) on arbitrary complex-geometry domains. The proposed framework, named eXtended PINNs ( X P I N N s ), further pushes the boundaries of both PINNs as well as conservative PINNs (cPINNs), which is a recently proposed domain decomposition approach in the PINN framework tailored to conservation laws. Compared to PINN, the XPINN method has large representation and parallelization capacity due to the inherent property of deployment of multiple neural networks in the smaller subdomains. Unlike cPINN, XPINN canmore »« less
https://doi.org/10.4208/cicp.oa-2020-0164

Full Text Available
Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

Journal Article Jagtap, Ameya D. ; Kawaguchi, Kenji ; Em Karniadakis, George - Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences

Here we propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that inmore »« less
https://doi.org/10.1098/rspa.2020.0334

Full Text Available

Similar Records

Title: Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness

Abstract

Citation Formats

Robust Large Margin Deep Neural Networks journal, August 2017

Balls in Rk do not cut all subsets of k + 2 points journal, March 1979

Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges journal, January 2018

Multilayer feedforward networks are universal approximators journal, January 1989

On the information bottleneck theory of deep learning journal, December 2019

Approximation by superpositions of a sigmoidal function journal, December 1989

Gradient-based learning applied to document recognition journal, January 1998

Mastering the game of Go with deep neural networks and tree search journal, January 2016

Balls in Rk do not cut all subsets of k + 2 points journal, March 1979

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning preprint, January 2014

Path-SGD: Path-Normalized Optimization in Deep Neural Networks preprint, January 2015

Gradient Descent Converges to Minimizers preprint, January 2016

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima preprint, January 2016

Generalization Error of Invariant Classifiers text, January 2016

Data-Dependent Stability of Stochastic Gradient Descent preprint, January 2017

Sharp Minima Can Generalize For Deep Nets preprint, January 2017

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data text, January 2017

Exploring Generalization in Deep Learning preprint, January 2017

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks preprint, January 2017

Generalization in Deep Learning text, January 2017

Stronger generalization bounds for deep nets via a compression approach preprint, January 2018

Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds preprint, January 2018

On the Spectral Bias of Neural Networks text, January 2018

Robust Large Margin Deep Neural Networks
journal, August 2017

Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979

Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
journal, January 2018

Multilayer feedforward networks are universal approximators
journal, January 1989

On the information bottleneck theory of deep learning
journal, December 2019

Approximation by superpositions of a sigmoidal function
journal, December 1989

Gradient-based learning applied to document recognition
journal, January 1998

Mastering the game of Go with deep neural networks and tree search
journal, January 2016

Balls in Rk do not cut all subsets of k + 2 points
journal, March 1979

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
preprint, January 2014

Path-SGD: Path-Normalized Optimization in Deep Neural Networks
preprint, January 2015

Gradient Descent Converges to Minimizers
preprint, January 2016

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
preprint, January 2016

Generalization Error of Invariant Classifiers
text, January 2016

Data-Dependent Stability of Stochastic Gradient Descent
preprint, January 2017

Sharp Minima Can Generalize For Deep Nets
preprint, January 2017

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
text, January 2017

Exploring Generalization in Deep Learning
preprint, January 2017

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
preprint, January 2017

Generalization in Deep Learning
text, January 2017

Stronger generalization bounds for deep nets via a compression approach
preprint, January 2018

Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
preprint, January 2018

On the Spectral Bias of Neural Networks
text, January 2018