skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Abstract

We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using $P$ processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see $P$ processes as logically divided into a $$P_r \times P_c$$ grid where the $$P_r$$ dimension is implicitly responsible for model/domain parallelism and the $$P_c$$ dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure model nor with pure data parallelism. We also show how the domain parallel approach can help in extending the theoretical scaling limit of the typical batch parallel method.

Authors:
 [1];  [1];  [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1454501
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article
Resource Relation:
Conference: 30. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Vienna (Austria), 16-18 Jul 2018
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Gholami, Amir, Azad, Ariful, Jin, Peter, Keutzer, Kurt, and Buluc, Aydin. Integrated Model, Batch and Domain Parallelism in Training Neural Networks. United States: N. p., 2018. Web. doi:10.1145/3210377.3210394.
Gholami, Amir, Azad, Ariful, Jin, Peter, Keutzer, Kurt, & Buluc, Aydin. Integrated Model, Batch and Domain Parallelism in Training Neural Networks. United States. doi:10.1145/3210377.3210394.
Gholami, Amir, Azad, Ariful, Jin, Peter, Keutzer, Kurt, and Buluc, Aydin. Thu . "Integrated Model, Batch and Domain Parallelism in Training Neural Networks". United States. doi:10.1145/3210377.3210394.
@article{osti_1454501,
title = {Integrated Model, Batch and Domain Parallelism in Training Neural Networks},
author = {Gholami, Amir and Azad, Ariful and Jin, Peter and Keutzer, Kurt and Buluc, Aydin},
abstractNote = {We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using $P$ processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see $P$ processes as logically divided into a $P_r \times P_c$ grid where the $P_r$ dimension is implicitly responsible for model/domain parallelism and the $P_c$ dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure model nor with pure data parallelism. We also show how the domain parallel approach can help in extending the theoretical scaling limit of the typical batch parallel method.},
doi = {10.1145/3210377.3210394},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {6}
}

Works referenced in this record:

SUMMA: scalable universal matrix multiplication algorithm
journal, April 1997


Brain tumor segmentation with Deep Neural Networks
journal, January 2017


Using the BSP cost model to optimise parallel neural network training
journal, December 1998


Minimizing Communication in Numerical Linear Algebra
journal, July 2011

  • Ballard, Grey; Demmel, James; Holtz, Olga
  • SIAM Journal on Matrix Analysis and Applications, Vol. 32, Issue 3
  • DOI: 10.1137/090769156

Collective communication: theory, practice, and experience
journal, January 2007

  • Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi
  • Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13
  • DOI: 10.1002/cpe.1206

Parallel Matrix Multiplication: A Systematic Journey
journal, January 2016

  • Schatz, Martin D.; van de Geijn, Robert A.; Poulson, Jack
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 6
  • DOI: 10.1137/140993478

Optimization of Collective Communication Operations in MPICH
journal, February 2005

  • Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
  • The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
  • DOI: 10.1177/1094342005051521