Achieving balanced execution through runtime detection of performance variation
Abstract
Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1650744
- Patent Number(s):
- 10613957
- Application Number:
- 15/192,764
- Assignee:
- Advanced Micro Devices, Inc. (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
Y - NEW / CROSS SECTIONAL TECHNOLOGIES Y02 - TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE Y02D - CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THIR OWN ENERGY USE
- DOE Contract Number:
- AC02-05CH11231
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 06/24/2016
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Kocoloski, Brian J., Piga, Leonardo, Huang, Wei, and Paul, Indrani. Achieving balanced execution through runtime detection of performance variation. United States: N. p., 2020.
Web.
Kocoloski, Brian J., Piga, Leonardo, Huang, Wei, & Paul, Indrani. Achieving balanced execution through runtime detection of performance variation. United States.
Kocoloski, Brian J., Piga, Leonardo, Huang, Wei, and Paul, Indrani. Tue .
"Achieving balanced execution through runtime detection of performance variation". United States. https://www.osti.gov/servlets/purl/1650744.
@article{osti_1650744,
title = {Achieving balanced execution through runtime detection of performance variation},
author = {Kocoloski, Brian J. and Piga, Leonardo and Huang, Wei and Paul, Indrani},
abstractNote = {Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {4}
}