DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimized asynchronous training of neural networks using a distributed parameter server with eager updates

Abstract

A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

Inventors:
; ; ;
Issue Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Advanced Micro Devices, Inc., Santa Clara, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1998297
Patent Number(s):
11630994
Application Number:
15/898,433
Assignee:
Advanced Micro Devices, Inc. (Santa Clara, CA)
DOE Contract Number:  
AC52-07NA27344; B620717
Resource Type:
Patent
Resource Relation:
Patent File Date: 02/17/2018
Country of Publication:
United States
Language:
English

Citation Formats

Hamidouche, Khaled, LeBeane, Michael W., Benton, Walter B., and Chu, Michael L. Optimized asynchronous training of neural networks using a distributed parameter server with eager updates. United States: N. p., 2023. Web.
Hamidouche, Khaled, LeBeane, Michael W., Benton, Walter B., & Chu, Michael L. Optimized asynchronous training of neural networks using a distributed parameter server with eager updates. United States.
Hamidouche, Khaled, LeBeane, Michael W., Benton, Walter B., and Chu, Michael L. Tue . "Optimized asynchronous training of neural networks using a distributed parameter server with eager updates". United States. https://www.osti.gov/servlets/purl/1998297.
@article{osti_1998297,
title = {Optimized asynchronous training of neural networks using a distributed parameter server with eager updates},
author = {Hamidouche, Khaled and LeBeane, Michael W. and Benton, Walter B. and Chu, Michael L.},
abstractNote = {A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2023},
month = {4}
}

Works referenced in this record:

System and method of detecting lost video data packets
patent, August 2010


Fine-Grain Compute Communication Execution for Deep Learning Frameworks
patent-application, November 2018


Deep Learning Training System
patent-application, November 2015


Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters
conference, December 2016


Systems and methods for remote interaction
patent, March 2021


S-Caffe
journal, January 2017


Systems and Methods for Exchange of Data in Distributed Training of Machine Learning Algorithms
patent-application, May 2019


Model Parallel Processing Method and Apparatus Based on Multiple Graphic Processing Units
patent-application, November 2016


Data Storage Method and Network Interface Card
patent-application, November 2016