Fast neural network training on a cluster of GPUs for action recognition with high accuracy

Cong, G; Domeniconi, G; Yang, C; Shapiro, J; Zhou, F; Chen, B Y

doi:10.1016/j.jpdc.2019.07.009

Title: Fast neural network training on a cluster of GPUs for action recognition with high accuracy

Abstract

In this work, we propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence. We customize the Adam optimizer for our distributed algorithm to improve efficiency. In addition, we employ transfer-learning to further reduce training time while improving validation accuracy. For the UCF101 and HMDB51 datasets, the validation accuracies achieved are 93.1% and 67.9% respectively. With an additional end-to-end trained temporal stream, the validation accuracies achieved for UCF101 and HMDB51 are 93.47% and 81.24% respectively. As far as we know, these are the highest accuracies achieved with the two-stream approach using ResNet that does not involve computationally expensive 3D convolutions or pretraining on much larger datasets.

Authors:

Cong, G ^[1]; Domeniconi, G ^[1]; Yang, C ^[1]; Shapiro, J ^[2]; Zhou, F ^[3]; Chen, B Y ^[4]

IBM TJ Watson Research Center, Yorktown Heights, NY (United States)
ASAPP, New York, NY (United States)
Baidu Research USA, Bellevue, WA (United States)
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Publication Date:: Thu Aug 29 00:00:00 EDT 2019

Research Org.:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Org.:: USDOE National Nuclear Security Administration (NNSA)

OSTI Identifier:: 1669241

Alternate Identifier(s):: OSTI ID: 2325457

Report Number(s):: LLNL-JRNL-814435
Journal ID: ISSN 0743-7315; 1021930

Grant/Contract Number:: AC52-07NA27344

Resource Type:: Accepted Manuscript

Journal Name:: Journal of Parallel and Distributed Computing

Additional Journal Information:: Journal Volume: 134; Journal Issue: na; Journal ID: ISSN 0743-7315

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; Machine learning; video analytics; distributed training; transfer learning; GPU

Citation Formats


                    Cong, G, Domeniconi, G, Yang, C, Shapiro, J, Zhou, F, and Chen, B Y. Fast neural network training on a cluster of GPUs for action recognition with high accuracy.  United States: N. p., 2019. 
Web.  doi:10.1016/j.jpdc.2019.07.009.

Copy to clipboard


                    Cong, G, Domeniconi, G, Yang, C, Shapiro, J, Zhou, F, & Chen, B Y. Fast neural network training on a cluster of GPUs for action recognition with high accuracy.  United States.  https://doi.org/10.1016/j.jpdc.2019.07.009

Copy to clipboard


                    Cong, G, Domeniconi, G, Yang, C, Shapiro, J, Zhou, F, and Chen, B Y. Thu .  
"Fast neural network training on a cluster of GPUs for action recognition with high accuracy".  United States.  https://doi.org/10.1016/j.jpdc.2019.07.009.  https://www.osti.gov/servlets/purl/1669241.

Copy to clipboard


                    
@article{osti_1669241,

  title        = {Fast neural network training on a cluster of GPUs for action recognition with high accuracy},

  author       = {Cong, G and Domeniconi, G and Yang, C and Shapiro, J and Zhou, F and Chen, B Y},

  abstractNote = {In this work, we propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence. We customize the Adam optimizer for our distributed algorithm to improve efficiency. In addition, we employ transfer-learning to further reduce training time while improving validation accuracy. For the UCF101 and HMDB51 datasets, the validation accuracies achieved are 93.1% and 67.9% respectively. With an additional end-to-end trained temporal stream, the validation accuracies achieved for UCF101 and HMDB51 are 93.47% and 81.24% respectively. As far as we know, these are the highest accuracies achieved with the two-stream approach using ResNet that does not involve computationally expensive 3D convolutions or pretraining on much larger datasets.},

  doi          = {10.1016/j.jpdc.2019.07.009},

  journal      = {Journal of Parallel and Distributed Computing},

  number       = na,

  volume       = 134,

  place        = {United States},

  year         = {Thu Aug 29 00:00:00 EDT 2019},

  month        = {Thu Aug 29 00:00:00 EDT 2019}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.jpdc.2019.07.009

Other availability

Search WorldCat to find libraries that may hold this journal

Figures / Tables:

Figure 1: Two-stream training architecture

All figures and tables (17 total)

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
conference, July 2017

Carreira, Joao; Zisserman, Andrew
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI: 10.1109/CVPR.2017.502

Deep Residual Learning for Image Recognition
conference, June 2016

He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI: 10.1109/CVPR.2016.90

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
conference, July 2017

Ilg, Eddy; Mayer, Nikolaus; Saikia, Tonmoy
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI: 10.1109/CVPR.2017.179

ImageNet classification with deep convolutional neural networks
journal, May 2017

Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E.
Communications of the ACM, Vol. 60, Issue 6
DOI: 10.1145/3065386

TV-L1 Optical Flow Estimation
journal, January 2013

Sánchez Pérez, Javier; Meinhardt-Llopis, Enric; Facciolo, Gabriele
Image Processing On Line, Vol. 3
DOI: 10.5201/ipol.2013.26

Going deeper with convolutions
conference, June 2015

Szegedy, Christian
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI: 10.1109/CVPR.2015.7298594

Figures / Tables found in this record:

Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.

Similar Records in DOE PAGES and OSTI.GOV collections:

Scalable deep text comprehension for Cancer surveillance on high-performance computing

Conference Qiu, John X. ; Yoon, Hong-Jun ; Srivastava, Kshitij ; ...

Background: Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets. This vastly increased computational demand challenges the feasibility of conducting cutting-edge research. One solution is to distribute the vast computational workload across multiple computing cluster nodes with data parallelism algorithms. In this study, we used a High-Performance Computing environment and implemented the Downpour Stochastic Gradient Descent algorithm for data parallelism to train a Convolutional Neural Network (CNN) for the natural language processing task of information extraction from a massivemore »« less
https://doi.org/10.1186/s12859-018-2511-9

Full Text Available
Throughput-Oriented and Accuracy-Aware DNN Training with BFloat16 on GPU

Conference Xie, Zhen ; Raskar, Siddhisanket ; Emani, Murali

Deep Neural Networks (DNNs) have transformed the field of artificial intelligence and achieved extraordinary success in many areas. The training of DNNs is commonly compute and memory-intensive, which has resulted in several optimizations in the training phase. Among them, reduced precision is a typical and widely used technique to accelerate DNN training and reduce memory requirements. However, applying a widely adopted reduced precision format such as Float16 to all involved operations in DNN training is not optimal as the use of Float16 in some operations can hurt model accuracy. Meanwhile, additional optimizations including loss scaling and autocast techniques can mitigatemore »« less
https://doi.org/10.1109/IPDPSW55747.2022.00176
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Conference Wang, Linnan ; Ye, Jinmian ; Zhao, Yiyang ; ...

Going deeper and wider in neural architectures improves their accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need to change to less de- sired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, Liveness Analysis, Unified Tensor Pool, and Cost-Aware Recomputation; together they effectively reduce the network-wide peak memory usage downmore »« less
https://doi.org/10.1145/3178487.3178491
Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Journal Article Choi, Jong Youl ; Zhang, Pei ; Mehta, Kshitij ; ... - Journal of Cheminformatics

Abstract Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing open a path to reduce the computational cost for GCNN training effectively. However, efficient utilization of high performance computing (HPC) resources for training requires simultaneously optimizing large-scale data management and scalable stochastic batched optimization techniques. In this work, we focus on building GCNN modelsmore »« less
https://doi.org/10.1186/s13321-022-00652-1
Convolutional Neural Networks for Hydrometeor Classification using Dual Polarization Doppler Radars

Conference Lu, Yuping ; Kumar, Jitendra

Traditional fuzzy logic hydrometeor classification algorithm is a common way to classify precipitation type from dual polarization doppler radar. We propose a deep learning-based method to estimate hydrometeors efficiently using observed radar variables such as horizontal reflectivity (Z H ), differential reflectivity (Z DR ), correlation coefficient (ρ HV ) and specific differential phase (K DP ) from National Weather Service NEXRAD collected at Vance AFB facility at the first elevation angle from January 1st, 2015 to July 31th, 2019. We stack matrices of values from these four polarimetric variables as one 3D array. Samples are preprocessed and divided intomore »« less
https://doi.org/10.1109/ICDMW.2019.00050

Full Text Available

Similar Records

Title: Fast neural network training on a cluster of GPUs for action recognition with high accuracy

Abstract

Citation Formats

Figures / Tables:

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset conference, July 2017

Deep Residual Learning for Image Recognition conference, June 2016

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks conference, July 2017

ImageNet classification with deep convolutional neural networks journal, May 2017

TV-L1 Optical Flow Estimation journal, January 2013

Going deeper with convolutions conference, June 2015

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
conference, July 2017

Deep Residual Learning for Image Recognition
conference, June 2016

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
conference, July 2017

ImageNet classification with deep convolutional neural networks
journal, May 2017

TV-L1 Optical Flow Estimation
journal, January 2013

Going deeper with convolutions
conference, June 2015