DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming

Abstract

Here, FPGA-based heterogeneous computing platform, due to its extreme logic reconfigurability, emerges to be a strong contender as computing fabric in modern AI. As a result, various FPGA-based accelerators for deep CNN—the key driver of modern AI—have been proposed due to their advantages of high performance, reconfigurability, and fast development round, etc. In general, the consensus among researchers is that, although FPGA-based accelerator can achieve much higher energy efficiency, its raw computing performance lags behind when compared with GPUs with similar logic density. In this paper, we develop an alternative methodology to efficiently implement CNNs with FPGAs that outperform GPUs in terms of both power consumption and performance. Our key idea is to design a scalable hardware architecture and circuit design for large-scale CNNs that leverages a stochastic-based computing principle. Specifically, there are three major performance advantages. First, all key components of our deep learning CNN are designed and implemented to compute stochastically, thus achieving excellent computing performance and energy efficiency. Second, because our proposed CNN architecture enables a stream-mode computing, all of its stages can process even the partial results from preceding stages, therefore not incurring unnecessary latency due to data dependency. Finally, our FPGA-based deep CNN also providesmore » a superior hardware scalability when compared with conventional FPGA implementations by reducing the bandwidth requirement between layers. The results show that our proposed CNN architecture significantly outperforms all previous FPGA-based deep CNN implementation approaches. It achieves 1.58x more GOPS, 6.42x more GOPS/Slice, and 10.92x more GOPS/W when compared with state-of-the-art CNN architecture. The top-5 accuracy of stochastic VGG-16 CNN is 86.77 percent with 18.91 fps frame rate.« less

Authors:
ORCiD logo [1]; ORCiD logo [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Univ. of Central Florida, Orlando, FL (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1493138
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Multi-Scale Computing Systems
Additional Journal Information:
Journal Volume: 4; Journal Issue: 4; Journal ID: ISSN 2372-207X
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; convolutional neural network; FPGA; stochastic computing

Citation Formats

Alawad, Mohammed, and Lin, Mingjie. Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming. United States: N. p., 2018. Web. doi:10.1109/TMSCS.2018.2886266.
Alawad, Mohammed, & Lin, Mingjie. Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming. United States. https://doi.org/10.1109/TMSCS.2018.2886266
Alawad, Mohammed, and Lin, Mingjie. Wed . "Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming". United States. https://doi.org/10.1109/TMSCS.2018.2886266. https://www.osti.gov/servlets/purl/1493138.
@article{osti_1493138,
title = {Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming},
author = {Alawad, Mohammed and Lin, Mingjie},
abstractNote = {Here, FPGA-based heterogeneous computing platform, due to its extreme logic reconfigurability, emerges to be a strong contender as computing fabric in modern AI. As a result, various FPGA-based accelerators for deep CNN—the key driver of modern AI—have been proposed due to their advantages of high performance, reconfigurability, and fast development round, etc. In general, the consensus among researchers is that, although FPGA-based accelerator can achieve much higher energy efficiency, its raw computing performance lags behind when compared with GPUs with similar logic density. In this paper, we develop an alternative methodology to efficiently implement CNNs with FPGAs that outperform GPUs in terms of both power consumption and performance. Our key idea is to design a scalable hardware architecture and circuit design for large-scale CNNs that leverages a stochastic-based computing principle. Specifically, there are three major performance advantages. First, all key components of our deep learning CNN are designed and implemented to compute stochastically, thus achieving excellent computing performance and energy efficiency. Second, because our proposed CNN architecture enables a stream-mode computing, all of its stages can process even the partial results from preceding stages, therefore not incurring unnecessary latency due to data dependency. Finally, our FPGA-based deep CNN also provides a superior hardware scalability when compared with conventional FPGA implementations by reducing the bandwidth requirement between layers. The results show that our proposed CNN architecture significantly outperforms all previous FPGA-based deep CNN implementation approaches. It achieves 1.58x more GOPS, 6.42x more GOPS/Slice, and 10.92x more GOPS/W when compared with state-of-the-art CNN architecture. The top-5 accuracy of stochastic VGG-16 CNN is 86.77 percent with 18.91 fps frame rate.},
doi = {10.1109/TMSCS.2018.2886266},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
number = 4,
volume = 4,
place = {United States},
year = {Wed Dec 12 00:00:00 EST 2018},
month = {Wed Dec 12 00:00:00 EST 2018}
}