The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Journal Article
·
· IEEE Transactions on Parallel and Distributed Systems
- Tokyo Institute of Technology (Japan); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Eidgenoessische Technische Hochschule (ETH), Zurich (Switzerland); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- University of Oregon, Eugene, OR (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- RIKEN Center for Computational Science, Hyogo (Japan); Tokyo Institute of Technology (Japan)
Here, we present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up to 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.
- Research Organization:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- Exascale Computing Project; Japan Society for the Promotion of Science (JSPS); USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- AC02-05CH11231; AC52-07NA27344
- OSTI ID:
- 1959404
- Report Number(s):
- LLNL-JRNL-812691; 1019825
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: N/A; ISSN 1045-9219
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
|
conference | November 2018 |
| One weird trick for parallelizing convolutional neural networks | preprint | January 2014 |
| V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation | preprint | January 2016 |
Infrastructure for Machine Learning: Ideas from Industry and Research
|
audiovisual | January 2019 |
Mastering the game of Go with deep neural networks and tree search
|
journal | January 2016 |
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
|
conference | October 2016 |
Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers
|
conference | December 2016 |
ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity
|
conference | December 2017 |
Accelerating Deep Learning Frameworks with Micro-Batches
|
conference | September 2018 |
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
|
conference | September 2019 |
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
|
conference | July 2017 |
Data sieving and collective I/O in ROMIO
|
conference | January 1999 |
Towards Scalable Deep Learning via I/O Analysis and Optimization
|
conference | December 2017 |
Learning Spatiotemporal Features with 3D Convolutional Networks
|
conference | December 2015 |
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
|
conference | May 2019 |
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
|
conference | September 2018 |
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design
|
conference | October 2016 |
Characterizing Deep-Learning I/O Workloads in TensorFlow
|
conference | November 2018 |
Exascale Deep Learning for Climate Analytics
|
conference | November 2018 |
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
|
conference | November 2018 |
Parallel netCDF: A High-Performance Scientific I/O Interface
|
conference | January 2003 |
LBANN: livermore big artificial neural network HPC toolkit
|
conference | January 2015 |
Superneurons
|
conference | February 2018 |
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
|
conference | July 2018 |
Channel and filter parallelism for large-scale CNN training
|
conference | November 2019 |
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis
|
journal | August 2019 |
I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning
|
conference | August 2019 |
PipeDream: generalized pipeline parallelism for DNN training
|
conference | October 2019 |
Optimization of Collective Communication Operations in MPICH
|
journal | February 2005 |
New Approaches in Turbulence and Transition Modeling Using Data-driven Techniques
|
conference | January 2015 |
Similar Records
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Wootz: a compiler-based framework for fast CNN pruning via composability
In-Place Zero-Space Memory Protection for CNN
Conference
·
Thu May 01 00:00:00 EDT 2025
·
OSTI ID:3002431
Wootz: a compiler-based framework for fast CNN pruning via composability
Conference
·
Sat Jun 01 00:00:00 EDT 2019
·
OSTI ID:1543204
In-Place Zero-Space Memory Protection for CNN
Conference
·
Sat Nov 30 23:00:00 EST 2019
·
OSTI ID:1606858