Determining collective barrier operation skew in a parallel computer
Abstract
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by: identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1226811
- Patent Number(s):
- 9195517
- Application Number:
- 13/685,869
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2012 Nov 27
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Faraj, Daniel A. Determining collective barrier operation skew in a parallel computer. United States: N. p., 2015.
Web.
Faraj, Daniel A. Determining collective barrier operation skew in a parallel computer. United States.
Faraj, Daniel A. Tue .
"Determining collective barrier operation skew in a parallel computer". United States. https://www.osti.gov/servlets/purl/1226811.
@article{osti_1226811,
title = {Determining collective barrier operation skew in a parallel computer},
author = {Faraj, Daniel A.},
abstractNote = {Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by: identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {11}
}
Works referenced in this record:
A Clock Synchronization Strategy for Minimizing Clock Variance at Runtime in High-End Computing Environments
conference, October 2010
- Jones, Terry; Koenig, Gregory A.
- 2010 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Replay-Based Synchronization of Timestamps in Event Traces of Massively Parallel Applications
conference, September 2008
- Becker, Daniel; Linford, John C.; Rabenseifner, Rolf
- 2008 International Conference on Parallel Processing Workshops (ICPP-W), 2008 International Conference on Parallel Processing - Workshops
Probabilistic internal clock synchronization
conference, January 1994
- Cristian, F.; Fetzer, C.
- Proceedings of IEEE 13th Symposium on Reliable Distributed Systems
The accuracy of the clock synchronization achieved by TEMPO in Berkeley UNIX 4.3BSD
journal, July 1989
- Gusella, R.; Zatti, S.
- IEEE Transactions on Software Engineering, Vol. 15, Issue 7