skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Fine-Grained Exploitation of Mixed Precision for Faster CNN Training

Abstract

As deep convolutional neural networks (CNNs) have become increasingly popular and successful at an ever-widening number of machine learning tasks specialized hardware has become increasingly available for training and deploying them. NVIDIA's recent Volta architecture includes tensor cores which perform a fused operation reduced and mixed precision (16-bit multiply, 32-bit accumulate). Recent research indicates that, typically, very little is lost (in terms of training accuracy) when half precision is used in place of single precision, and performance gains can be made by doing arithmetic in reduced precision. In this work we demonstrate that making layer-by-layer choices as to the arithmetic/data precision can lead to further performance improvement. In our study of 25,200 CNNs we demonstrate an average speedup (over purely half precision) of 1.27x and speedups as high as 3.64x by appropriately combining single and half precision arithmetic and data types on a layer-by-layer basis.c

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1608214
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) - Denver, Colorado, United States of America - 11/17/2019 9:00:00 AM-11/18/2019 9:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Johnston, Travis, Young, Steven, Schuman, Catherine, Chae, Junghoon, March, Don, Patton, Robert, and Potok, Thomas. Fine-Grained Exploitation of Mixed Precision for Faster CNN Training. United States: N. p., 2019. Web. doi:10.1109/MLHPC49564.2019.00007.
Johnston, Travis, Young, Steven, Schuman, Catherine, Chae, Junghoon, March, Don, Patton, Robert, & Potok, Thomas. Fine-Grained Exploitation of Mixed Precision for Faster CNN Training. United States. doi:10.1109/MLHPC49564.2019.00007.
Johnston, Travis, Young, Steven, Schuman, Catherine, Chae, Junghoon, March, Don, Patton, Robert, and Potok, Thomas. Fri . "Fine-Grained Exploitation of Mixed Precision for Faster CNN Training". United States. doi:10.1109/MLHPC49564.2019.00007. https://www.osti.gov/servlets/purl/1608214.
@article{osti_1608214,
title = {Fine-Grained Exploitation of Mixed Precision for Faster CNN Training},
author = {Johnston, Travis and Young, Steven and Schuman, Catherine and Chae, Junghoon and March, Don and Patton, Robert and Potok, Thomas},
abstractNote = {As deep convolutional neural networks (CNNs) have become increasingly popular and successful at an ever-widening number of machine learning tasks specialized hardware has become increasingly available for training and deploying them. NVIDIA's recent Volta architecture includes tensor cores which perform a fused operation reduced and mixed precision (16-bit multiply, 32-bit accumulate). Recent research indicates that, typically, very little is lost (in terms of training accuracy) when half precision is used in place of single precision, and performance gains can be made by doing arithmetic in reduced precision. In this work we demonstrate that making layer-by-layer choices as to the arithmetic/data precision can lead to further performance improvement. In our study of 25,200 CNNs we demonstrate an average speedup (over purely half precision) of 1.27x and speedups as high as 3.64x by appropriately combining single and half precision arithmetic and data types on a layer-by-layer basis.c},
doi = {10.1109/MLHPC49564.2019.00007},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {11}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: