Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles; Tallent, Nathan R.; Vishnu, Abhinav

doi:10.1016/j.future.2018.04.073

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Journal Article · Sat May 05 00:00:00 EDT 2018 · Future Generations Computer Systems

DOI:https://doi.org/10.1016/j.future.2018.04.073· OSTI ID:1617450

Gawande, Nitin A. ^[1]; ^[1]; Siegel, Charles ^[1]; Tallent, Nathan R. ^[1]; Vishnu, Abhinav ^[1]

Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.

View Accepted Manuscript (DOE)

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

Grant/Contract Number:: AC05-76RL01830

OSTI ID:: 1617450

Report Number(s):: PNNL-SA--134513

Journal Information:: Future Generations Computer Systems, Journal Name: Future Generations Computer Systems Journal Issue: C Vol. 108; ISSN 0167-739X

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

References (13)

ImageNet Large Scale Visual Recognition Challenge Russakovsky, Olga; Deng, Jia; Su, Hao International Journal of Computer Vision, Vol. 115, Issue 3 https://doi.org/10.1007/s11263-015-0816-y	journal	April 2015
Searching for exotic particles in high-energy physics with deep learning Baldi, P.; Sadowski, P.; Whiteson, D. Nature Communications, Vol. 5, Issue 1 https://doi.org/10.1038/ncomms5308	journal	July 2014
Benchmarking State-of-the-Art Deep Learning Software Tools Shi, Shaohuai; Wang, Qiang; Xu, Pengfei 2016 7th International Conference on Cloud Computing and Big Data (CCBD) https://doi.org/10.1109/CCBD.2016.029	conference	November 2016
Going deeper with convolutions Szegedy, Christian 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2015.7298594	conference	June 2015
FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters Iandola, Forrest N.; Moskewicz, Matthew W.; Ashraf, Khalid 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.284	conference	June 2016
Deep Residual Learning for Image Recognition He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.90	conference	June 2016
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeff A. 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.36	conference	May 2017
Large Minibatch Training on Supercomputers with Improved Accuracy and Reduced Time to Train Codreanu, Valeriu; Podareanu, Damian; Saletore, Vikram 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC) https://doi.org/10.1109/MLHPC.2018.8638634	conference	November 2018
Knights Landing: Second-Generation Intel Xeon Phi Product Sodani, Avinash; Gramunt, Roger; Corbal, Jesus IEEE Micro, Vol. 36, Issue 2 https://doi.org/10.1109/MM.2016.25	journal	March 2016
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition Torralba, A.; Fergus, R.; Freeman, W. T. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, Issue 11 https://doi.org/10.1109/TPAMI.2008.128	journal	November 2008
RAPL: memory power estimation and capping David, Howard; Gorbatov, Eugene; Hanebutte, Ulf R. Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10 https://doi.org/10.1145/1840845.1840883	conference	January 2010
Caffe: Convolutional Architecture for Fast Feature Embedding Jia, Yangqing; Shelhamer, Evan; Donahue, Jeff Proceedings of the ACM International Conference on Multimedia - MM '14 https://doi.org/10.1145/2647868.2654889	conference	January 2014
Theano: A CPU and GPU Math Compiler in Python Bergstra, James; Breuleux, Olivier; Bastien, Frédéric Proceedings of the Python in Science Conference https://doi.org/10.25080/Majora-92bf1922-003	conference	January 2010

Cited By (2)

Applications of Artificial Intelligence Methodologies to Behavioral and Social Sciences Robila, Mihaela; Robila, Stefan A. Journal of Child and Family Studies, Vol. 29, Issue 10 https://doi.org/10.1007/s10826-019-01689-x	journal	December 2019
A Framework for Memory Oversubscription Management in Graphics Processing Units Li, Chen; Ausavarungnirun, Rachata; Rossbach, Christopher J. ASPLOS '19: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3297858.3304044	conference	April 2019