skip to main content

DOE PAGESDOE PAGES

This content will become publicly available on May 5, 2019

Title: Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.
Authors:
 [1] ; ORCiD logo [1] ;  [1] ;  [1] ;  [1]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Publication Date:
Report Number(s):
PNNL-SA-134513
Journal ID: ISSN 0167-739X; PII: S0167739X17318599
Grant/Contract Number:
AC05-76RL01830
Type:
Accepted Manuscript
Journal Name:
Future Generations Computer Systems
Additional Journal Information:
Journal Name: Future Generations Computer Systems; Journal ID: ISSN 0167-739X
Publisher:
Elsevier
Research Org:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; NVIDIA DGX-1; Intel Knights Landing; Caffe; maTEx; Deep learning; Convolutional neural networks
OSTI Identifier:
1437017

Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., and Vishnu, Abhinav. Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. United States: N. p., Web. doi:10.1016/j.future.2018.04.073.
Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., & Vishnu, Abhinav. Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. United States. doi:10.1016/j.future.2018.04.073.
Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., and Vishnu, Abhinav. 2018. "Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing". United States. doi:10.1016/j.future.2018.04.073.
@article{osti_1437017,
title = {Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing},
author = {Gawande, Nitin A. and Daily, Jeff A. and Siegel, Charles and Tallent, Nathan R. and Vishnu, Abhinav},
abstractNote = {Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.},
doi = {10.1016/j.future.2018.04.073},
journal = {Future Generations Computer Systems},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {5}
}