Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Journal Article ·
 [1];  [1];  [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using $$P$$ processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see $$P$$ processes as logically divided into a $$P_r \times P_c$$ grid where the $$P_r$$ dimension is implicitly responsible for model/domain parallelism and the $$P_c$$ dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure model nor with pure data parallelism. We also show how the domain parallel approach can help in extending the theoretical scaling limit of the typical batch parallel method.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1454501
Country of Publication:
United States
Language:
English

References (12)

Minimizing Communication in Numerical Linear Algebra journal July 2011
Brain tumor segmentation with Deep Neural Networks journal January 2017
Parallel Matrix Multiplication: A Systematic Journey journal January 2016
Accurate Image Super-Resolution Using Very Deep Convolutional Networks conference June 2016
Deep Residual Learning for Image Recognition conference June 2016
Collective communication: theory, practice, and experience journal January 2007
Optimization of Collective Communication Operations in MPICH journal February 2005
Using the BSP cost model to optimise parallel neural network training journal December 1998
Fully convolutional networks for semantic segmentation conference June 2015
SUMMA: scalable universal matrix multiplication algorithm journal April 1997
Going deeper with convolutions conference June 2015
Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication conference May 2016

Similar Records

Improving Deep Neural Networks’ Training for Image Classification With Nonlinear Conjugate Gradient-Style Adaptive Momentum
Journal Article · Fri Mar 24 00:00:00 EDT 2023 · IEEE Transactions on Neural Networks and Learning Systems · OSTI ID:2280651

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection
Journal Article · Tue Oct 05 00:00:00 EDT 2021 · SIAM Journal on Mathematics of Data Science · OSTI ID:1834344

Improving scalability of parallel CNN training by adaptively adjusting parameter update frequency
Journal Article · Wed Sep 29 00:00:00 EDT 2021 · Journal of Parallel and Distributed Computing · OSTI ID:1864734

Related Subjects