Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Li, Ang; Su, Simon

doi:10.1109/TPDS.2020.3045828

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Journal Article · Thu Jul 01 04:00:00 EDT 2021 · IEEE Transactions on Parallel and Distributed Systems

DOI:https://doi.org/10.1109/TPDS.2020.3045828· OSTI ID:1774004

Li, Ang ^[1]; Su, Simon ^[2]

BATTELLE (PACIFIC NW LAB)
US Army Research Laboratory (ARL)

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to being unable to leverage bit-level-parallelism with a word-based architecture, GPUs have been criticized for extremely low utilization (1%) when executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation. In this work, we look into this brand new bit computation capability and characterize its unique features. We show that the stride of memory access can significantly affect performance delivery and a data-format co-design is highly desired to support the tensorcores for achieving superior performance than existing software solutions without tensorcores. We realize the tensorcore-accelerated BNN design, particularly the major functions for fully-connect and convolution layers — bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77% faster than state-of-the-art. Our BNN approach is released on https://github.com/pnnl/TCBNN.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1774004

Report Number(s):: PNNL-SA-156570

Journal Information:: IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 7 Vol. 32

Country of Publication:: United States

Language:: English

Similar Records

BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets

Conference · Sat Nov 16 23:00:00 EST 2019 · OSTI ID:1580517

GPU Accelerated Singular Binarized Neural Network Inference Framework

Software · Thu Sep 05 20:00:00 EDT 2019 · OSTI ID:code-29001

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism

Conference · Thu Sep 05 00:00:00 EDT 2019 · OSTI ID:1765112

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Citation Formats

Similar Records

Related Subjects