skip to main content
DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Managing cluster-level performance variability without a centralized controller

Abstract

Systems, apparatuses, and methods for managing cluster-level performance variability without a centralized controller are described. Each node of a multi-node cluster tracks a maximum and minimum progress across the plurality of nodes for a workload executed by the cluster. Each node also tracks its local progress on its current task. Each node also utilizes a comparison of the local progress to reported maximum and minimum progress across the cluster to identify a critical, or slow, node and whether to increase or reduce an amount of power allocated to the node. The nodes append information about the maximum and minimum progress to messages sent to other nodes to report their knowledge of maximum and minimum progress with other nodes. A node updates its local information if the node receives a message from another node with more up-to-date information about the state of progress across the cluster.

Inventors:
Issue Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1568096
Patent Number(s):
10,237,335
Application Number:
15/183,625
Assignee:
Advanced Micro Devices, Inc. (Santa Clara, CA)
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Patent
Resource Relation:
Patent File Date: 06/15/2016
Country of Publication:
United States
Language:
English

Citation Formats

Piga, Leonardo. Managing cluster-level performance variability without a centralized controller. United States: N. p., 2019. Web.
Piga, Leonardo. Managing cluster-level performance variability without a centralized controller. United States.
Piga, Leonardo. Tue . "Managing cluster-level performance variability without a centralized controller". United States. https://www.osti.gov/servlets/purl/1568096.
@article{osti_1568096,
title = {Managing cluster-level performance variability without a centralized controller},
author = {Piga, Leonardo},
abstractNote = {Systems, apparatuses, and methods for managing cluster-level performance variability without a centralized controller are described. Each node of a multi-node cluster tracks a maximum and minimum progress across the plurality of nodes for a workload executed by the cluster. Each node also tracks its local progress on its current task. Each node also utilizes a comparison of the local progress to reported maximum and minimum progress across the cluster to identify a critical, or slow, node and whether to increase or reduce an amount of power allocated to the node. The nodes append information about the maximum and minimum progress to messages sent to other nodes to report their knowledge of maximum and minimum progress with other nodes. A node updates its local information if the node receives a message from another node with more up-to-date information about the state of progress across the cluster.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {3}
}

Patent:

Save / Share:

Works referenced in this record:

Dynamically adaptive, resource aware system and method for scheduling
patent, June 2017


Parallel I/O write processing for use in clustered file systems having cache storage
patent, April 2017


Parallel file system and method for multiple node file access
patent, February 2000


Parallel file system and method with a metadata node
patent, October 1999