skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming

Journal Article · · IEEE Transactions on Multi-Scale Computing Systems

Here, FPGA-based heterogeneous computing platform, due to its extreme logic reconfigurability, emerges to be a strong contender as computing fabric in modern AI. As a result, various FPGA-based accelerators for deep CNN—the key driver of modern AI—have been proposed due to their advantages of high performance, reconfigurability, and fast development round, etc. In general, the consensus among researchers is that, although FPGA-based accelerator can achieve much higher energy efficiency, its raw computing performance lags behind when compared with GPUs with similar logic density. In this paper, we develop an alternative methodology to efficiently implement CNNs with FPGAs that outperform GPUs in terms of both power consumption and performance. Our key idea is to design a scalable hardware architecture and circuit design for large-scale CNNs that leverages a stochastic-based computing principle. Specifically, there are three major performance advantages. First, all key components of our deep learning CNN are designed and implemented to compute stochastically, thus achieving excellent computing performance and energy efficiency. Second, because our proposed CNN architecture enables a stream-mode computing, all of its stages can process even the partial results from preceding stages, therefore not incurring unnecessary latency due to data dependency. Finally, our FPGA-based deep CNN also provides a superior hardware scalability when compared with conventional FPGA implementations by reducing the bandwidth requirement between layers. The results show that our proposed CNN architecture significantly outperforms all previous FPGA-based deep CNN implementation approaches. It achieves 1.58x more GOPS, 6.42x more GOPS/Slice, and 10.92x more GOPS/W when compared with state-of-the-art CNN architecture. The top-5 accuracy of stochastic VGG-16 CNN is 86.77 percent with 18.91 fps frame rate.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1493138
Journal Information:
IEEE Transactions on Multi-Scale Computing Systems, Vol. 4, Issue 4; ISSN 2372-207X
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

Similar Records

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters
Journal Article · Sat Aug 01 00:00:00 EDT 2020 · IEEE Transactions on Computers · OSTI ID:1493138

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design
Conference · Wed Feb 26 00:00:00 EST 2020 · OSTI ID:1493138

Real-time data analysis for medical diagnosis using FPGA-accelerated neural networks
Journal Article · Fri Dec 21 00:00:00 EST 2018 · BMC Bioinformatics · OSTI ID:1493138