Data optimization for large batch distributed training of deep neural networks

Gahlot, Shubhankar; Yin, Junqi; Shankar, Mallikarjun (Arjun)

doi:10.1109/CSCI51800.2020.00225

Data optimization for large batch distributed training of deep neural networks

Conference · Mon Nov 30 23:00:00 EST 2020

DOI:https://doi.org/10.1109/CSCI51800.2020.00225· OSTI ID:1807275

Gahlot, Shubhankar ^[1]; ^[1]; ^[1]

ORNL

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniques to tweak batch sizes and models in the training process. The loss of training accuracy typically happens because the loss function gets trapped in a local minima. We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1807275

Country of Publication:: United States

Language:: English

Similar Records

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Journal Article · Thu Jun 14 00:00:00 EDT 2018 · OSTI ID:1454501

Anderson Acceleration for Distributed Training of Deep Learning Models

Conference · Mon Feb 28 23:00:00 EST 2022 · OSTI ID:1866678

Ramifications of Evolving Misbehaving Convolutional Neural Network Kernel and Batch Sizes

Conference · Thu Nov 01 00:00:00 EDT 2018 · OSTI ID:1495999

Data optimization for large batch distributed training of deep neural networks

Citation Formats

Similar Records

Related Subjects