DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Abstract

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.

Authors:
 [1]; ORCiD logo [1];  [1];  [1];  [1]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Publication Date:
Research Org.:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1617450
Alternate Identifier(s):
OSTI ID: 1778383
Report Number(s):
PNNL-SA-134513
Journal ID: ISSN 0167-739X
Grant/Contract Number:  
AC05-76RL01830
Resource Type:
Accepted Manuscript
Journal Name:
Future Generations Computer Systems
Additional Journal Information:
Journal Volume: 108; Journal Issue: C; Journal ID: ISSN 0167-739X
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; NVIDIA DGX-1; Intel Knights Landing; Caffe; maTEx; Deep learning; Convolutional neural networks

Citation Formats

Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., and Vishnu, Abhinav. Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. United States: N. p., 2018. Web. doi:10.1016/j.future.2018.04.073.
Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., & Vishnu, Abhinav. Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. United States. https://doi.org/10.1016/j.future.2018.04.073
Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., and Vishnu, Abhinav. Sat . "Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing". United States. https://doi.org/10.1016/j.future.2018.04.073. https://www.osti.gov/servlets/purl/1617450.
@article{osti_1617450,
title = {Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing},
author = {Gawande, Nitin A. and Daily, Jeff A. and Siegel, Charles and Tallent, Nathan R. and Vishnu, Abhinav},
abstractNote = {Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.},
doi = {10.1016/j.future.2018.04.073},
journal = {Future Generations Computer Systems},
number = C,
volume = 108,
place = {United States},
year = {Sat May 05 00:00:00 EDT 2018},
month = {Sat May 05 00:00:00 EDT 2018}
}

Journal Article:

Citation Metrics:
Cited by: 9 works
Citation information provided by
Web of Science

Figures / Tables:

Fig. 1 Fig. 1: Diagram of DGX-1 topology.

Save / Share:

Works referenced in this record:

Going deeper with convolutions
conference, June 2015


Searching for exotic particles in high-energy physics with deep learning
journal, July 2014

  • Baldi, P.; Sadowski, P.; Whiteson, D.
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms5308

Caffe: Convolutional Architecture for Fast Feature Embedding
conference, January 2014

  • Jia, Yangqing; Shelhamer, Evan; Donahue, Jeff
  • Proceedings of the ACM International Conference on Multimedia - MM '14
  • DOI: 10.1145/2647868.2654889

Theano: A CPU and GPU Math Compiler in Python
conference, January 2010


Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
conference, May 2017

  • Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeff A.
  • 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2017.36

Deep Residual Learning for Image Recognition
conference, June 2016

  • He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2016.90

Knights Landing: Second-Generation Intel Xeon Phi Product
journal, March 2016

  • Sodani, Avinash; Gramunt, Roger; Corbal, Jesus
  • IEEE Micro, Vol. 36, Issue 2
  • DOI: 10.1109/MM.2016.25

80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
journal, November 2008

  • Torralba, A.; Fergus, R.; Freeman, W. T.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, Issue 11
  • DOI: 10.1109/TPAMI.2008.128

ImageNet Large Scale Visual Recognition Challenge
journal, April 2015

  • Russakovsky, Olga; Deng, Jia; Su, Hao
  • International Journal of Computer Vision, Vol. 115, Issue 3
  • DOI: 10.1007/s11263-015-0816-y

FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters
conference, June 2016

  • Iandola, Forrest N.; Moskewicz, Matthew W.; Ashraf, Khalid
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2016.284

RAPL: memory power estimation and capping
conference, January 2010

  • David, Howard; Gorbatov, Eugene; Hanebutte, Ulf R.
  • Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10
  • DOI: 10.1145/1840845.1840883

Benchmarking State-of-the-Art Deep Learning Software Tools
conference, November 2016

  • Shi, Shaohuai; Wang, Qiang; Xu, Pengfei
  • 2016 7th International Conference on Cloud Computing and Big Data (CCBD)
  • DOI: 10.1109/CCBD.2016.029

Large Minibatch Training on Supercomputers with Improved Accuracy and Reduced Time to Train
conference, November 2018

  • Codreanu, Valeriu; Podareanu, Damian; Saletore, Vikram
  • 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)
  • DOI: 10.1109/MLHPC.2018.8638634

Works referencing / citing this record:

Applications of Artificial Intelligence Methodologies to Behavioral and Social Sciences
journal, December 2019


A Framework for Memory Oversubscription Management in Graphics Processing Units
conference, April 2019

  • Li, Chen; Ausavarungnirun, Rachata; Rossbach, Christopher J.
  • ASPLOS '19: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
  • DOI: 10.1145/3297858.3304044