Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scaling deep learning on GPU and knights landing clusters

Journal Article · · International Conference for High Performance Computing, Networking, Storage and Analysis
 [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1439212
Journal Information:
International Conference for High Performance Computing, Networking, Storage and Analysis, Journal Name: International Conference for High Performance Computing, Networking, Storage and Analysis Journal Issue: 9 Vol. 2017; ISSN 2167-4329
Publisher:
IEEE
Country of Publication:
United States
Language:
English

References (13)

ImageNet: A large-scale hierarchical image database
  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848
conference June 2009
CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems conference May 2015
Deep Residual Learning for Image Recognition preprint January 2015
FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters conference June 2016
Deep Residual Learning for Image Recognition conference June 2016
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs conference November 2016
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs conference September 2014
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs preprint January 2016
Gradient-based learning applied to document recognition journal January 1998
Going Deeper with Convolutions preprint January 2014
Going deeper with convolutions conference June 2015
Efficient mini-batch training for stochastic optimization conference January 2014
Efficiency Optimization of Trainable Feature Extractors for a Consumer Platform book January 2011

Cited By (6)

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory
  • Miao, Hongyu; Jeon, Myeongjae; Pekhimenko, Gennady
  • ASPLOS '19: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3297858.3304031
conference April 2019
Reducing Data Motion to Accelerate the Training of Deep Neural Networks preprint January 2020
GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent preprint January 2018
Efficient MPI‐AllReduce for large‐scale deep learning on GPU‐clusters journal December 2019
StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory text January 2019
A Survey on Distributed Machine Learning journal March 2021

Similar Records

Scaling Deep Learning on GPU and Knights Landing clusters
Journal Article · Tue Sep 26 00:00:00 EDT 2017 · International Conference for High Performance Computing, Networking, Storage and Analysis (Online) · OSTI ID:1398518

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference · Mon Jul 03 00:00:00 EDT 2017 · OSTI ID:1373860

Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference · Thu Aug 24 00:00:00 EDT 2017 · OSTI ID:1411927

Related Subjects