Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Gholami, Amir; Azad, Ariful; Jin, Peter; Keutzer, Kurt; Buluc, Aydin

doi:10.1145/3210377.3210394

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Journal Article · Thu Jun 14 00:00:00 EDT 2018

DOI:https://doi.org/10.1145/3210377.3210394· OSTI ID:1454501

Gholami, Amir ^[1]; Azad, Ariful ^[1]; Jin, Peter ^[1]; Keutzer, Kurt ^[1]; Buluc, Aydin ^[1]

Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using $$P$$ processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see $$P$$ processes as logically divided into a $$P_r \times P_c$$ grid where the $$P_r$$ dimension is implicitly responsible for model/domain parallelism and the $$P_c$$ dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure model nor with pure data parallelism. We also show how the domain parallel approach can help in extending the theoretical scaling limit of the typical batch parallel method.

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

DOE Contract Number:: AC02-05CH11231

OSTI ID:: 1454501

Country of Publication:: United States

Language:: English

References (12)

Minimizing Communication in Numerical Linear Algebra Ballard, Grey; Demmel, James; Holtz, Olga SIAM Journal on Matrix Analysis and Applications, Vol. 32, Issue 3 https://doi.org/10.1137/090769156	journal	July 2011
Brain tumor segmentation with Deep Neural Networks Havaei, Mohammad; Davy, Axel; Warde-Farley, David Medical Image Analysis, Vol. 35 https://doi.org/10.1016/j.media.2016.05.004	journal	January 2017
Parallel Matrix Multiplication: A Systematic Journey Schatz, Martin D.; van de Geijn, Robert A.; Poulson, Jack SIAM Journal on Scientific Computing, Vol. 38, Issue 6 https://doi.org/10.1137/140993478	journal	January 2016
Accurate Image Super-Resolution Using Very Deep Convolutional Networks Kim, Jiwon; Lee, Jung Kwon; Lee, Kyoung Mu 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.182	conference	June 2016
Deep Residual Learning for Image Recognition He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.90	conference	June 2016
Collective communication: theory, practice, and experience Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13 https://doi.org/10.1002/cpe.1206	journal	January 2007
Optimization of Collective Communication Operations in MPICH Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William The International Journal of High Performance Computing Applications, Vol. 19, Issue 1 https://doi.org/10.1177/1094342005051521	journal	February 2005
Using the BSP cost model to optimise parallel neural network training Rogers, R. O.; Skillicorn, D. B. Future Generation Computer Systems, Vol. 14, Issue 5-6 https://doi.org/10.1016/S0167-739X(98)00043-0	journal	December 1998
Fully convolutional networks for semantic segmentation Long, Jonathan; Shelhamer, Evan; Darrell, Trevor 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2015.7298965	conference	June 2015
SUMMA: scalable universal matrix multiplication algorithm Van De Geijn, R. A.; Watts, J. Concurrency: Practice and Experience, Vol. 9, Issue 4 https://doi.org/10.1002/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2	journal	April 1997
Going deeper with convolutions Szegedy, Christian 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2015.7298594	conference	June 2015
Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication Koanantakool, Penporn; Azad, Ariful; Buluc, Aydin 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2016.117	conference	May 2016

Similar Records

Improving Deep Neural Networks’ Training for Image Classification With Nonlinear Conjugate Gradient-Style Adaptive Momentum

Journal Article · Fri Mar 24 00:00:00 EDT 2023 · IEEE Transactions on Neural Networks and Learning Systems · OSTI ID:2280651

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

Journal Article · Tue Oct 05 00:00:00 EDT 2021 · SIAM Journal on Mathematics of Data Science · OSTI ID:1834344

Improving scalability of parallel CNN training by adaptively adjusting parameter update frequency

Journal Article · Wed Sep 29 00:00:00 EDT 2021 · Journal of Parallel and Distributed Computing · OSTI ID:1864734

Related Subjects

97 MATHEMATICS AND COMPUTING

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Citation Formats

References (12)

Similar Records

Related Subjects