Anderson Acceleration for Distributed Training of Deep Learning Models

Lupo Pasini, Massimiliano; Yin, Junqi; Reshniak, Viktor; Stoyanov, Miroslav

Anderson Acceleration for Distributed Training of Deep Learning Models

Conference · Tue Mar 01 04:00:00 EST 2022

OSTI ID:1866678

^[1]; ^[1]; ^[1]; ^[1]

ORNL

Anderson acceleration (AA) is an extrapolation technique that has recently gained interest in the deep learning (DL) community to speed-up the sequential training of DL models. However, when performed at large scale, the DL training is exposed to a higher risk of getting trapped into steep local minima of the training loss function, and standard AA does not provide sufficient acceleration to escape from these steep local minima. This results in poor generalizability and makes AA ineffective. To restore AA’s advantage to speed-up the training of DL models on large scale computing platforms, we combine AA with an adaptive moving average procedure that boosts the training to escape from steep local minima. By monitoring the relative standard deviation between consecutive iterations, we also introduce a criterion to automatically assess whether the moving average is needed. We applied the method to the following DL instantiations for image classification: (i) ResNet50 trained on the open-source CIFAR100 dataset and (ii) ResNet50 trained on the open-source ImageNet1k dataset. Numerical results obtained using up to 1,536 NVIDIA V100 GPUs on the OLCF supercomputer Summit showed the stabilizing effect of the moving average on AA for all the problems above.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1866678

Country of Publication:: United States

Language:: English

Similar Records

Scalable balanced training of conditional generative adversarial neural networks on image data

Journal Article · Sun Apr 25 20:00:00 EDT 2021 · Journal of Supercomputing · OSTI ID:1783019

Stable parallel training of Wasserstein conditional generative adversarial neural networks

Journal Article · Tue Aug 02 20:00:00 EDT 2022 · Journal of Supercomputing · OSTI ID:1908079

Stable Parallel Training of Wasserstein Conditional Generative Adversarial Neural Networks : *Full/Regular Research Paper submission for the symposium CSCI-ISAI: Artificial Intelligence

Conference · Tue Nov 30 23:00:00 EST 2021 · OSTI ID:1877492

Anderson Acceleration for Distributed Training of Deep Learning Models

Citation Formats

Similar Records

Related Subjects