O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference
Journal Article
·
· IEEE Transactions on Parallel and Distributed Systems
- Boston University
- BATTELLE (PACIFIC NW LAB)
- Zhejiang University
- University of Hong Kong
- Los Alamos National Laboratory
Binarized Neural Networks (BNN) have drawn tremendous attention due to significantly reduced computational complexity and memory demand. They have especially shown great potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching a certain accuracy bar is often sufficient, and real-time is highly desired.In this work, we demonstrate that the highly-condensed BNN model can be shrunk significantly further by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture – O3BNN-R, can curtail edge evaluation in cases where the binary output of a neuron can be determined early. Similar to Instruction-Level-Parallelism(ILP), these fine-grained, irregular, runtime pruning opportunities are traditionally presumed to be difficult to exploit. In order to increase the pruning opportunities, we also optimize the training process by adding 2 regularization items in the loss function (1) for pooling pruning and (2) for threshold pruning. We evaluate our design on an FPGA platform using three well-known networks, including VggNet-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1670985
- Report Number(s):
- PNNL-SA-148318
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 1 Vol. 32
- Country of Publication:
- United States
- Language:
- English
Similar Records
O3BNN: An Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning
LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism
BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets
Conference
·
Wed Aug 14 00:00:00 EDT 2019
·
OSTI ID:1764982
LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism
Conference
·
Thu Sep 05 00:00:00 EDT 2019
·
OSTI ID:1765112
BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets
Conference
·
Sat Nov 16 23:00:00 EST 2019
·
OSTI ID:1580517