Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles; Tallent, Nathan R.; Vishnu, Abhinav

doi:10.1016/j.future.2018.04.073

Title: Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Abstract

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.

Authors:

Gawande, Nitin A. ^[1];

^[1]; Siegel, Charles ^[1]; Tallent, Nathan R. ^[1]; Vishnu, Abhinav ^[1]

Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Publication Date:: Sat May 05 00:00:00 EDT 2018

Research Org.:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Org.:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

OSTI Identifier:: 1617450

Alternate Identifier(s):: OSTI ID: 1778383

Report Number(s):: PNNL-SA-134513
Journal ID: ISSN 0167-739X

Grant/Contract Number:: AC05-76RL01830

Resource Type:: Accepted Manuscript

Journal Name:: Future Generations Computer Systems

Additional Journal Information:: Journal Volume: 108; Journal Issue: C; Journal ID: ISSN 0167-739X

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; NVIDIA DGX-1; Intel Knights Landing; Caffe; maTEx; Deep learning; Convolutional neural networks

Citation Formats


                    Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., and Vishnu, Abhinav. Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing.  United States: N. p., 2018. 
Web.  doi:10.1016/j.future.2018.04.073.

Copy to clipboard


                    Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., & Vishnu, Abhinav. Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing.  United States.  https://doi.org/10.1016/j.future.2018.04.073

Copy to clipboard


                    Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., and Vishnu, Abhinav. Sat .  
"Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing".  United States.  https://doi.org/10.1016/j.future.2018.04.073.  https://www.osti.gov/servlets/purl/1617450.

Copy to clipboard


                    
@article{osti_1617450,

  title        = {Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing},

  author       = {Gawande, Nitin A. and Daily, Jeff A. and Siegel, Charles and Tallent, Nathan R. and Vishnu, Abhinav},

  abstractNote = {Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.},

  doi          = {10.1016/j.future.2018.04.073},

  journal      = {Future Generations Computer Systems},

  number       = C,

  volume       = 108,

  place        = {United States},

  year         = {Sat May 05 00:00:00 EDT 2018},

  month        = {Sat May 05 00:00:00 EDT 2018}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.future.2018.04.073

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 9 works

Citation information provided by
Web of Science

Figures / Tables:

Fig. 1: Diagram of DGX-1 topology.

All figures and tables (19 total)

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Going deeper with convolutions
conference, June 2015

Szegedy, Christian
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI: 10.1109/CVPR.2015.7298594

Searching for exotic particles in high-energy physics with deep learning
journal, July 2014

Baldi, P.; Sadowski, P.; Whiteson, D.
Nature Communications, Vol. 5, Issue 1
DOI: 10.1038/ncomms5308

Caffe: Convolutional Architecture for Fast Feature Embedding
conference, January 2014

Jia, Yangqing; Shelhamer, Evan; Donahue, Jeff
Proceedings of the ACM International Conference on Multimedia - MM '14
DOI: 10.1145/2647868.2654889

Theano: A CPU and GPU Math Compiler in Python
conference, January 2010

Bergstra, James; Breuleux, Olivier; Bastien, Frédéric
Proceedings of the Python in Science Conference
DOI: 10.25080/Majora-92bf1922-003