Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

Journal Article · · IEEE Transactions on Parallel and Distributed Systems
 [1];  [2];  [3];  [4];  [5];  [5];  [6];  [5];  [2]
  1. Tokyo Institute of Technology (Japan); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  2. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  3. Eidgenoessische Technische Hochschule (ETH), Zurich (Switzerland); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  4. University of Oregon, Eugene, OR (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  5. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
  6. RIKEN Center for Computational Science, Hyogo (Japan); Tokyo Institute of Technology (Japan)
Here, we present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up to 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
Exascale Computing Project; Japan Society for the Promotion of Science (JSPS); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-05CH11231; AC52-07NA27344
OSTI ID:
1959404
Report Number(s):
LLNL-JRNL-812691; 1019825
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: N/A; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (30)

CosmoFlow: Using Deep Learning to Learn the Universe at Scale
  • Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/sc.2018.00068
conference November 2018
One weird trick for parallelizing convolutional neural networks preprint January 2014
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation preprint January 2016
Infrastructure for Machine Learning: Ideas from Industry and Research audiovisual January 2019
Mastering the game of Go with deep neural networks and tree search journal January 2016
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation conference October 2016
Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers conference December 2016
ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity conference December 2017
Accelerating Deep Learning Frameworks with Micro-Batches conference September 2018
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets conference September 2019
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset conference July 2017
Data sieving and collective I/O in ROMIO conference January 1999
Towards Scalable Deep Learning via I/O Analysis and Optimization
  • Pumma, Sarunya; Si, Min; Feng, Wu-chun
  • 2017 IEEE 19th International Conference on High Performance Computing and Communications, IEEE 15th International Conference on Smart City and IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.29
conference December 2017
Learning Spatiotemporal Features with 3D Convolutional Networks conference December 2015
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism conference May 2019
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
  • Zhu, Yue; Chowdhury, Fahim; Fu, Huansong
  • 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) https://doi.org/10.1109/MASCOTS.2018.00023
conference September 2018
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design conference October 2016
Characterizing Deep-Learning I/O Workloads in TensorFlow
  • Chien, Steven W. D.; Markidis, Stefano; Sishtla, Chaitanya Prasad
  • 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS) https://doi.org/10.1109/PDSW-DISCS.2018.00011
conference November 2018
Exascale Deep Learning for Climate Analytics conference November 2018
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
  • Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00068
conference November 2018
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
LBANN: livermore big artificial neural network HPC toolkit conference January 2015
Superneurons conference February 2018
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks conference July 2018
Channel and filter parallelism for large-scale CNN training conference November 2019
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis journal August 2019
I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning conference August 2019
PipeDream: generalized pipeline parallelism for DNN training
  • Narayanan, Deepak; Harlap, Aaron; Phanishayee, Amar
  • SOSP '19: ACM SIGOPS 27th Symposium on Operating Systems Principles, Proceedings of the 27th ACM Symposium on Operating Systems Principles https://doi.org/10.1145/3341301.3359646
conference October 2019
Optimization of Collective Communication Operations in MPICH journal February 2005
New Approaches in Turbulence and Transition Modeling Using Data-driven Techniques conference January 2015

Similar Records

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Conference · Thu May 01 00:00:00 EDT 2025 · OSTI ID:3002431

Wootz: a compiler-based framework for fast CNN pruning via composability
Conference · Sat Jun 01 00:00:00 EDT 2019 · OSTI ID:1543204

In-Place Zero-Space Memory Protection for CNN
Conference · Sat Nov 30 23:00:00 EST 2019 · OSTI ID:1606858