Runtime extension for neural network training with heterogeneous memory
Abstract
Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Advanced Micro Devices, Inc., Santa Clara, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 2293673
- Patent Number(s):
- 11775799
- Application Number:
- 16/194,958
- Assignee:
- Advanced Micro Devices, Inc. (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
G - PHYSICS G06 - COMPUTING G06N - COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- DOE Contract Number:
- AC52-07NA27344; B620717
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 11/19/2018
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Mappouras, Georgios, Farmahini-Farahani, Amin, Gurumurthi, Sudhanva, Vishnu, Abhinav, and Loh, Gabriel H. Runtime extension for neural network training with heterogeneous memory. United States: N. p., 2023.
Web.
Mappouras, Georgios, Farmahini-Farahani, Amin, Gurumurthi, Sudhanva, Vishnu, Abhinav, & Loh, Gabriel H. Runtime extension for neural network training with heterogeneous memory. United States.
Mappouras, Georgios, Farmahini-Farahani, Amin, Gurumurthi, Sudhanva, Vishnu, Abhinav, and Loh, Gabriel H. Tue .
"Runtime extension for neural network training with heterogeneous memory". United States. https://www.osti.gov/servlets/purl/2293673.
@article{osti_2293673,
title = {Runtime extension for neural network training with heterogeneous memory},
author = {Mappouras, Georgios and Farmahini-Farahani, Amin and Gurumurthi, Sudhanva and Vishnu, Abhinav and Loh, Gabriel H.},
abstractNote = {Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2023},
month = {10}
}
Works referenced in this record:
EIE: Efficient Inference Engine on Compressed Deep Neural Network
conference, June 2016
- Han, Song; Liu, Xingyu; Mao, Huizi
- 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
Texture features for biometric authentication
patent, February 2013
- Derakhshani, Reza R.; Gottemukkula, Vikas; Hughlett, Casey
- US Patent Document 8,369,595
moDNN: Memory optimal DNN training on GPUs
conference, March 2018
- Chen, Xiaoming; Chen, Danny Z.; Hu, Xiaobo Sharon
- 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Deep Neural Network Processor with Interleaved Backpropagation
patent-application, May 2019
- Goulding, John R.; Mixter, John E.; Mucha, David R.
- US Patent Application 15/810946; 20190147342
Low Overhead Message Passing for High Performance Many-Core Processors
conference, December 2013
- Kumar, Sumeet S.; Djie, Mitzi Tjin A.; Van Leuken, Rene
- 2013 First International Symposium on Computing and Networking
Reconfigurable hardware accelerator for boolean satisfiability solver
patent, March 2012
- Davis, John D.; Tan, Zhangxi; Yu, Fang
- US Patent Document 8,131,660
Machine Learning Inference Engine Scalability
patent-application, October 2019
- Zhang, Lei; Lagudu, Sateesh; Rush, Allen
- US Patent Application 16/117302; 20190325305
Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNs
book, January 2017
- Cecconi, Leonardo; Smets, Sander; Benini, Luca
- Advanced Concepts for Intelligent Vision Systems
Efficient FPGA Acceleration of Convolutional Neural Networks Using Logical-3D Compute Array
conference, January 2016
- Rahman, Atul; Lee, Jongeun; Choi, Kiyoung
- Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Auto Generation and Tuning Tool for Convolution Kernels
patent-application, September 2020
- Wang, Fei
- US Patent Application 16/367093; 20200302285
Variable rate vocoder
patent, August 1997
- Jacobs, Paul E.; Gardner, William R.; Lee, Chong U.
- US Patent Document 5,657,420
Methods and systems for reduced complexity nonlinear compensation
patent, April 2016
- Zhuge, Qunbi; Oveis Gharan, Shahab; Reimer, Michael Andrew
- US Patent Document 9,319,137
Profile-guided proactive garbage collection for locality optimization
journal, June 2006
- Chen, Wen-ke; Bhansali, Sanjay; Chilimbi, Trishul
- ACM SIGPLAN Notices, Vol. 41, Issue 6
Memory Bandwidth Reduction Techniques for Low Power Convolutional Neural Network Inference Applications
patent-application, May 2019
- Lagudu, Sateesh; Zhang, Lei; Rush, Allen
- US Patent Application 15/812336; 20190147332
Low Latency Long Short-Term Memory Inference with Sequence Interleaving
patent-application, April 2020
- Lagudu, Sateesh; Zhang, Lei; Rush, Allen H.
- US Patent Application 16/177218; 20200134432
Tiling format for convolutional neural networks
patent, September 2020
- Zhang, Song; Liu, Jiantan; Zhang, Hua
- US Patent Document 10,762,392
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
conference, February 2018
- Rhu, Minsoo; O'Connor, Mike; Chatterjee, Niladrish
- 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)
System and method for improved general object detection using neural networks
patent, September 2018
- Pierce, Brian; English, Elliot; Kumar, Ankit
- US Patent Document 10,078,794
Neural fuzzy connection admission controller and method in a node of an asynchronous transfer mode (ATM) communication network
patent, May 2000
- Chung-Ju, Chang; Ray-Guang, Cheng; Kuen-Ruey, Lu
- US Patent Document 6,067,287
Method for Calculating an Output of a Neural Network
patent-application, August 2019
- Schorn, Christian; Vogel, Sebastian
- US Patent Application 16/348401; 20190266476
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design
conference, October 2016
- Rhu, Minsoo; Gimelshein, Natalia; Clemons, Jason
- 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
F-C3D: FPGA-based 3-dimensional convolutional neural network
conference, September 2017
- Fan, Hongxiang; Niu, Xinyu; Liu, Qiang
- 2017 27th International Conference on Field Programmable Logic and Applications (FPL)
Dynamic Acceleration of Data Processor Operations using Data-Flow Analysis
patent-application, October 2019
- Beard, Jonathan Curtis; Dunham, Curtis Glenn; Carro, Alejandro Rico
- US Patent Application 15/939637; 20190303143