Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs
Journal Article
·
· IEEE Transactions on Parallel and Distributed Systems
- BATTELLE (PACIFIC NW LAB)
- US Army Research Laboratory (ARL)
Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to being unable to leverage bit-level-parallelism with a word-based architecture, GPUs have been criticized for extremely low utilization (1%) when executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation. In this work, we look into this brand new bit computation capability and characterize its unique features. We show that the stride of memory access can significantly affect performance delivery and a data-format co-design is highly desired to support the tensorcores for achieving superior performance than existing software solutions without tensorcores. We realize the tensorcore-accelerated BNN design, particularly the major functions for fully-connect and convolution layers — bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77% faster than state-of-the-art. Our BNN approach is released on https://github.com/pnnl/TCBNN.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1774004
- Report Number(s):
- PNNL-SA-156570
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 7 Vol. 32
- Country of Publication:
- United States
- Language:
- English
Similar Records
BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets
GPU Accelerated Singular Binarized Neural Network Inference Framework
LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism
Conference
·
Sat Nov 16 23:00:00 EST 2019
·
OSTI ID:1580517
GPU Accelerated Singular Binarized Neural Network Inference Framework
Software
·
Thu Sep 05 20:00:00 EDT 2019
·
OSTI ID:code-29001
LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism
Conference
·
Thu Sep 05 00:00:00 EDT 2019
·
OSTI ID:1765112