Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Journal Article · · IEEE Transactions on Parallel and Distributed Systems
 [1];  [2]
  1. BATTELLE (PACIFIC NW LAB)
  2. US Army Research Laboratory (ARL)
Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to being unable to leverage bit-level-parallelism with a word-based architecture, GPUs have been criticized for extremely low utilization (1%) when executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation. In this work, we look into this brand new bit computation capability and characterize its unique features. We show that the stride of memory access can significantly affect performance delivery and a data-format co-design is highly desired to support the tensorcores for achieving superior performance than existing software solutions without tensorcores. We realize the tensorcore-accelerated BNN design, particularly the major functions for fully-connect and convolution layers — bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77% faster than state-of-the-art. Our BNN approach is released on https://github.com/pnnl/TCBNN.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1774004
Report Number(s):
PNNL-SA-156570
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 7 Vol. 32
Country of Publication:
United States
Language:
English

Similar Records

BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets
Conference · Sat Nov 16 23:00:00 EST 2019 · OSTI ID:1580517

GPU Accelerated Singular Binarized Neural Network Inference Framework
Software · Thu Sep 05 20:00:00 EDT 2019 · OSTI ID:code-29001

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism
Conference · Thu Sep 05 00:00:00 EDT 2019 · OSTI ID:1765112

Related Subjects