Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming

Alawad, Mohammed; Lin, Mingjie

doi:10.1109/TMSCS.2018.2886266

Title: Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming

Journal Article · Wed Dec 12 00:00:00 EST 2018 · IEEE Transactions on Multi-Scale Computing Systems

DOI:https://doi.org/10.1109/TMSCS.2018.2886266· OSTI ID:1493138

^[1];

^[2]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Univ. of Central Florida, Orlando, FL (United States)

Here, FPGA-based heterogeneous computing platform, due to its extreme logic reconfigurability, emerges to be a strong contender as computing fabric in modern AI. As a result, various FPGA-based accelerators for deep CNN—the key driver of modern AI—have been proposed due to their advantages of high performance, reconfigurability, and fast development round, etc. In general, the consensus among researchers is that, although FPGA-based accelerator can achieve much higher energy efficiency, its raw computing performance lags behind when compared with GPUs with similar logic density. In this paper, we develop an alternative methodology to efficiently implement CNNs with FPGAs that outperform GPUs in terms of both power consumption and performance. Our key idea is to design a scalable hardware architecture and circuit design for large-scale CNNs that leverages a stochastic-based computing principle. Specifically, there are three major performance advantages. First, all key components of our deep learning CNN are designed and implemented to compute stochastically, thus achieving excellent computing performance and energy efficiency. Second, because our proposed CNN architecture enables a stream-mode computing, all of its stages can process even the partial results from preceding stages, therefore not incurring unnecessary latency due to data dependency. Finally, our FPGA-based deep CNN also provides a superior hardware scalability when compared with conventional FPGA implementations by reducing the bandwidth requirement between layers. The results show that our proposed CNN architecture significantly outperforms all previous FPGA-based deep CNN implementation approaches. It achieves 1.58x more GOPS, 6.42x more GOPS/Slice, and 10.92x more GOPS/W when compared with state-of-the-art CNN architecture. The top-5 accuracy of stochastic VGG-16 CNN is 86.77 percent with 18.91 fps frame rate.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

Grant/Contract Number:: AC05-00OR22725

OSTI ID:: 1493138

Journal Information:: IEEE Transactions on Multi-Scale Computing Systems, Vol. 4, Issue 4; ISSN 2372-207X

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

Similar Records

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

Journal Article · Sat Aug 01 00:00:00 EDT 2020 · IEEE Transactions on Computers · OSTI ID:1493138

Wang, Tianqi; Geng, Tong; Li, Ang; +2 more

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design

Conference · Wed Feb 26 00:00:00 EST 2020 · OSTI ID:1493138

Zhang, Xingyao; Song, Shuaiwen; Xie, Chenhao; +3 more

Real-time data analysis for medical diagnosis using FPGA-accelerated neural networks

Journal Article · Fri Dec 21 00:00:00 EST 2018 · BMC Bioinformatics · OSTI ID:1493138

Sanaullah, Ahmed; Yang, Chen; Alexeev, Yuri; +2 more

Related Subjects

97 MATHEMATICS AND COMPUTING
convolutional neural network
FPGA
stochastic computing

Title: Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming

Citation Formats

Similar Records

Related Subjects