skip to main content
DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Achieving balanced execution through runtime detection of performance variation

Abstract

Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.

Inventors:
; ; ;
Issue Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1650744
Patent Number(s):
10613957
Application Number:
15/192,764
Assignee:
Advanced Micro Devices, Inc. (Santa Clara, CA)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
Y - NEW / CROSS SECTIONAL TECHNOLOGIES Y02 - TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE Y02D - CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THIR OWN ENERGY USE
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Patent
Resource Relation:
Patent File Date: 06/24/2016
Country of Publication:
United States
Language:
English

Citation Formats

Kocoloski, Brian J., Piga, Leonardo, Huang, Wei, and Paul, Indrani. Achieving balanced execution through runtime detection of performance variation. United States: N. p., 2020. Web.
Kocoloski, Brian J., Piga, Leonardo, Huang, Wei, & Paul, Indrani. Achieving balanced execution through runtime detection of performance variation. United States.
Kocoloski, Brian J., Piga, Leonardo, Huang, Wei, and Paul, Indrani. Tue . "Achieving balanced execution through runtime detection of performance variation". United States. https://www.osti.gov/servlets/purl/1650744.
@article{osti_1650744,
title = {Achieving balanced execution through runtime detection of performance variation},
author = {Kocoloski, Brian J. and Piga, Leonardo and Huang, Wei and Paul, Indrani},
abstractNote = {Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {4}
}

Patent:

Save / Share: