Performing process migration with allreduce operations
Abstract
Compute nodes perform allreduce operations that swap processes at nodes. A first allreduce operation generates a first result and uses a first process from a first compute node, a second process from a second compute node, and zeros from other compute nodes. The first compute node replaces the first process with the first result. A second allreduce operation generates a second result and uses the first result from the first compute node, the second process from the second compute node, and zeros from others. The second compute node replaces the second process with the second result, which is the first process. A third allreduce operation generates a third result and uses the first result from first compute node, the second result from the second compute node, and zeros from others. The first compute node replaces the first result with the third result, which is the second process.
 Inventors:

 Rochester, MN
 Plymouth, MN
 Issue Date:
 Research Org.:
 International Business Machines Corp., Armonk, NY (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1009532
 Patent Number(s):
 7853639
 Application Number:
 11/531,175
 Assignee:
 International Business Machines Corporation (Armonk, NY)
 Patent Classifications (CPCs):

G  PHYSICS G06  COMPUTING G06F  ELECTRIC DIGITAL DATA PROCESSING
 DOE Contract Number:
 B519700
 Resource Type:
 Patent
 Country of Publication:
 United States
 Language:
 English
Citation Formats
Archer, Charles Jens, Peters, Amanda, and Wallenfelt, Brian Paul. Performing process migration with allreduce operations. United States: N. p., 2010.
Web.
Archer, Charles Jens, Peters, Amanda, & Wallenfelt, Brian Paul. Performing process migration with allreduce operations. United States.
Archer, Charles Jens, Peters, Amanda, and Wallenfelt, Brian Paul. Tue .
"Performing process migration with allreduce operations". United States. https://www.osti.gov/servlets/purl/1009532.
@article{osti_1009532,
title = {Performing process migration with allreduce operations},
author = {Archer, Charles Jens and Peters, Amanda and Wallenfelt, Brian Paul},
abstractNote = {Compute nodes perform allreduce operations that swap processes at nodes. A first allreduce operation generates a first result and uses a first process from a first compute node, a second process from a second compute node, and zeros from other compute nodes. The first compute node replaces the first process with the first result. A second allreduce operation generates a second result and uses the first result from the first compute node, the second process from the second compute node, and zeros from others. The second compute node replaces the second process with the second result, which is the first process. A third allreduce operation generates a third result and uses the first result from first compute node, the second result from the second compute node, and zeros from others. The first compute node replaces the first result with the third result, which is the second process.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2010},
month = {12}
}
Works referenced in this record:
The Autopilot performancedirected adaptive control system
journal, September 2001
 Ribler, Randy L.; Simitci, Huseyin; Reed, Daniel A.
 Future Generation Computer Systems, Vol. 18, Issue 1, p. 175187
Automated clusterbased web service performance tuning
conference, January 2004
 IHsin Chung, ; Hollingsworth, J. K.
 Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.