Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

How does momentum benefit deep neural networks architecture design? A few case studies

Journal Article · · Research in the Mathematical Sciences (Print)

Not provided.

Research Organization:
Purdue Univ., West Lafayette, IN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
SC0021142
OSTI ID:
1976766
Journal Information:
Research in the Mathematical Sciences (Print), Vol. 9, Issue 3; ISSN 2522-0144
Publisher:
SpringerOpen
Country of Publication:
United States
Language:
English

References (33)

ETC: Encoding Long and Structured Inputs in Transformers January 2020
Character-Level Language Modeling with Deeper Self-Attention July 2019
The Heavy ball with Friction Method, i. the Continuous Dynamical System: Global Exploration of the Local Minima of a Real-Valued Function by Asymptotic Analysis of a Dissipative Dynamical System February 2000
Learning long-term dependencies with gradient descent is difficult March 1994
Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies July 2019
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation January 2014
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context January 2019
Finding Structure in Time March 1990
Stable architectures for deep neural networks December 2017
Deep Residual Learning for Image Recognition June 2016
Identity Mappings in Deep Residual Networks January 2016
Long Short-Term Memory November 1997
Universal Language Model Fine-tuning for Text Classification January 2018
Moses January 2007
Deep learning May 2015
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement January 2018
Recurrent neural network based language model September 2010
Bleu January 2001
Some methods of speeding up the convergence of iteration methods January 1964
Mathematical Theory of Optimal Processes May 2018
Blockwise Self-Attention for Long Document Understanding January 2020
SQuAD: 100,000+ Questions for Machine Comprehension of Text January 2016
Efficient Content-Based Sparse Attention with Routing Transformers February 2021
Neural Machine Translation of Rare Words with Subword Units January 2016
Mastering the game of Go without human knowledge October 2017
Implicit Kernel Attention May 2021
Efficient Transformers: A Survey December 2022
MuJoCo: A physics engine for model-based control October 2012
Adversarial defense via the data-dependent activation, total variation minimization, and adversarial training January 2021
Graph interpolating activation improves both natural and robust accuracies in data-efficient deep learning December 2020
A variational perspective on accelerated methods in optimization November 2016
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
  • No authors listed
  • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) https://doi.org/10.18653/v1/N18-1101
January 2018
Nyströmformer: A Nyström-based Algorithm for Approximating Self-Attention May 2021