skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Abstract

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1411927
Report Number(s):
PNNL-SA-129129
KJ0402010
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2017), May 29-June 2, 2017, Lake Buena Vista, Florida, 399-408
Country of Publication:
United States
Language:
English

Citation Formats

Gawande, Nitin A., Landwehr, Joshua B., Daily, Jeffrey A., Tallent, Nathan R., Vishnu, Abhinav, and Kerbyson, Darren J. Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. United States: N. p., 2017. Web. doi:10.1109/IPDPSW.2017.36.
Gawande, Nitin A., Landwehr, Joshua B., Daily, Jeffrey A., Tallent, Nathan R., Vishnu, Abhinav, & Kerbyson, Darren J. Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. United States. doi:10.1109/IPDPSW.2017.36.
Gawande, Nitin A., Landwehr, Joshua B., Daily, Jeffrey A., Tallent, Nathan R., Vishnu, Abhinav, and Kerbyson, Darren J. Thu . "Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing". United States. doi:10.1109/IPDPSW.2017.36.
@article{osti_1411927,
title = {Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing},
author = {Gawande, Nitin A. and Landwehr, Joshua B. and Daily, Jeffrey A. and Tallent, Nathan R. and Vishnu, Abhinav and Kerbyson, Darren J.},
abstractNote = {Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.},
doi = {10.1109/IPDPSW.2017.36},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Aug 24 00:00:00 EDT 2017},
month = {Thu Aug 24 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: