Managing variations among nodes in parallel system frameworks
Abstract
Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1568617
- Patent Number(s):
- 10355966
- Application Number:
- 15/081,558
- Assignee:
- Advanced Micro Devices, Inc. (Santa Clara, CA)
- Patent Classifications (CPCs):
-
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- DOE Contract Number:
- AC52-07NA27344
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 03/25/2016
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Wasmundt, Samuel Lawrence, Piga, Leonardo, Paul, Indrani, Huang, Wei, and Arora, Manish. Managing variations among nodes in parallel system frameworks. United States: N. p., 2019.
Web.
Wasmundt, Samuel Lawrence, Piga, Leonardo, Paul, Indrani, Huang, Wei, & Arora, Manish. Managing variations among nodes in parallel system frameworks. United States.
Wasmundt, Samuel Lawrence, Piga, Leonardo, Paul, Indrani, Huang, Wei, and Arora, Manish. Tue .
"Managing variations among nodes in parallel system frameworks". United States. https://www.osti.gov/servlets/purl/1568617.
@article{osti_1568617,
title = {Managing variations among nodes in parallel system frameworks},
author = {Wasmundt, Samuel Lawrence and Piga, Leonardo and Paul, Indrani and Huang, Wei and Arora, Manish},
abstractNote = {Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {7}
}
Works referenced in this record:
Dynamic Hierarchical Performance Balancing of Computational Resources
patent-application, June 2016
- Eastep, Jonathan M.; Sharapov, Ilya; Greco, Richard J.
- US Patent Application 14/583237; 20160187944