Runtime extension for neural network training with heterogeneous memory

Mappouras, Georgios; Farmahini-Farahani, Amin; Gurumurthi, Sudhanva; Vishnu, Abhinav; Loh, Gabriel H.

Runtime extension for neural network training with heterogeneous memory

Patent · Tue Oct 03 00:00:00 EDT 2023

OSTI ID:2293673

Mappouras, Georgios; Farmahini-Farahani, Amin; Gurumurthi, Sudhanva; Vishnu, Abhinav; Loh, Gabriel H.

Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

Research Organization:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Advanced Micro Devices, Inc., Santa Clara, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC52-07NA27344

Assignee:: Advanced Micro Devices, Inc. (Santa Clara, CA)

Patent Number(s):: 11,775,799

Application Number:: 16/194,958

OSTI ID:: 2293673

Country of Publication:: United States

Language:: English

References (9)

Low Overhead Message Passing for High Performance Many-Core Processors Kumar, Sumeet S.; Djie, Mitzi Tjin A.; Van Leuken, Rene 2013 First International Symposium on Computing and Networking https://doi.org/10.1109/CANDAR.2013.62	conference	December 2013
Profile-guided proactive garbage collection for locality optimization Chen, Wen-ke; Bhansali, Sanjay; Chilimbi, Trishul ACM SIGPLAN Notices, Vol. 41, Issue 6 https://doi.org/10.1145/1133255.1134021	journal	June 2006
EIE: Efficient Inference Engine on Compressed Deep Neural Network Han, Song; Liu, Xingyu; Mao, Huizi 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) https://doi.org/10.1109/ISCA.2016.30	conference	June 2016
Efficient FPGA Acceleration of Convolutional Neural Networks Using Logical-3D Compute Array Rahman, Atul; Lee, Jongeun; Choi, Kiyoung Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) https://doi.org/10.3850/9783981537079_0833	conference	January 2016
F-C3D: FPGA-based 3-dimensional convolutional neural network Fan, Hongxiang; Niu, Xinyu; Liu, Qiang 2017 27th International Conference on Field Programmable Logic and Applications (FPL) https://doi.org/10.23919/FPL.2017.8056779	conference	September 2017
Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNs Cecconi, Leonardo; Smets, Sander; Benini, Luca Advanced Concepts for Intelligent Vision Systems https://doi.org/10.1007/978-3-319-70353-4_8	book	January 2017
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks Rhu, Minsoo; O'Connor, Mike; Chatterjee, Niladrish 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2018.00017	conference	February 2018
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design Rhu, Minsoo; Gimelshein, Natalia; Clemons, Jason 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) https://doi.org/10.1109/MICRO.2016.7783721	conference	October 2016
moDNN: Memory optimal DNN training on GPUs Chen, Xiaoming; Chen, Danny Z.; Hu, Xiaobo Sharon 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) https://doi.org/10.23919/DATE.2018.8341972	conference	March 2018

Similar Records

Hybrid memory module bridge network and buffers

Patent · Tue Oct 09 00:00:00 EDT 2018 · OSTI ID:1485288

Mechanism for reducing page migration overhead in memory systems

Patent · Tue Jul 02 00:00:00 EDT 2019 · OSTI ID:1568554

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures: Algorithms and Experiments

Technical Report · Mon Apr 02 00:00:00 EDT 2018 · OSTI ID:1435688

Runtime extension for neural network training with heterogeneous memory

Citation Formats

References (9)

Similar Records

Related Subjects