Managing cluster-level performance variability without a centralized controller
Abstract
Systems, apparatuses, and methods for managing cluster-level performance variability without a centralized controller are described. Each node of a multi-node cluster tracks a maximum and minimum progress across the plurality of nodes for a workload executed by the cluster. Each node also tracks its local progress on its current task. Each node also utilizes a comparison of the local progress to reported maximum and minimum progress across the cluster to identify a critical, or slow, node and whether to increase or reduce an amount of power allocated to the node. The nodes append information about the maximum and minimum progress to messages sent to other nodes to report their knowledge of maximum and minimum progress with other nodes. A node updates its local information if the node receives a message from another node with more up-to-date information about the state of progress across the cluster.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1568096
- Patent Number(s):
- 10237335
- Application Number:
- 15/183,625
- Assignee:
- Advanced Micro Devices, Inc. (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- DOE Contract Number:
- AC02-05CH11231
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 06/15/2016
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Piga, Leonardo. Managing cluster-level performance variability without a centralized controller. United States: N. p., 2019.
Web.
Piga, Leonardo. Managing cluster-level performance variability without a centralized controller. United States.
Piga, Leonardo. Tue .
"Managing cluster-level performance variability without a centralized controller". United States. https://www.osti.gov/servlets/purl/1568096.
@article{osti_1568096,
title = {Managing cluster-level performance variability without a centralized controller},
author = {Piga, Leonardo},
abstractNote = {Systems, apparatuses, and methods for managing cluster-level performance variability without a centralized controller are described. Each node of a multi-node cluster tracks a maximum and minimum progress across the plurality of nodes for a workload executed by the cluster. Each node also tracks its local progress on its current task. Each node also utilizes a comparison of the local progress to reported maximum and minimum progress across the cluster to identify a critical, or slow, node and whether to increase or reduce an amount of power allocated to the node. The nodes append information about the maximum and minimum progress to messages sent to other nodes to report their knowledge of maximum and minimum progress with other nodes. A node updates its local information if the node receives a message from another node with more up-to-date information about the state of progress across the cluster.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Mar 19 00:00:00 EDT 2019},
month = {Tue Mar 19 00:00:00 EDT 2019}
}
Works referenced in this record:
Dynamically adaptive, resource aware system and method for scheduling
patent, June 2017
- Gupta, Shekhar; Fritz, Christian; de Kleer, Johan
- US Patent Document 9,672,064
Optimization of Map-reduce Shuffle Performance Through Shuffler I/O Pipeline Actions and Planning
patent-application, May 2015
- Hu, Zhenhua; Ma, Hao Hai; Tang, Wentao
- 14/090282; 20150150017
Parallel I/O write processing for use in clustered file systems having cache storage
patent, April 2017
- Gunda, Kalyan C.; Hildebrand, Dean; Naik, Manoj P.
- US Patent Document 9,614,926
Parallel file system and method for multiple node file access
patent, February 2000
- Schmuck, Frank B.; McNabb, Daniel Lloyd; Wyllie, James C.
- US Patent Document 6,023,706
Optimization of map-reduce shuffle performance through shuffler I/O pipeline actions and planning
patent, May 2017
- Hu, Zhenhua; Ma, Hao Hai; Tang, Wentao
- US Patent Document 9,665,404
Parallel file system and method with a metadata node
patent, October 1999
- Schmuck, Frank B.; Curran, Robert J.; Wyllie, James C.
- US Patent Document 5,974,424
System and method for computer cluster virtualization using dynamic boot images and virtual disk
patent, May 2012
- Davidson, Shannon V.; Peterson, Robert J.
- US Patent Document 8,190,714