skip to main content
DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Managing variations among nodes in parallel system frameworks

Abstract

Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

Inventors:
; ; ; ;
Issue Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1568617
Patent Number(s):
10,355,966
Application Number:
15/081,558
Assignee:
Advanced Micro Devices, Inc. (Santa Clara, CA)
DOE Contract Number:  
AC52-07NA27344
Resource Type:
Patent
Resource Relation:
Patent File Date: 03/25/2016
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Wasmundt, Samuel Lawrence, Piga, Leonardo, Paul, Indrani, Huang, Wei, and Arora, Manish. Managing variations among nodes in parallel system frameworks. United States: N. p., 2019. Web.
Wasmundt, Samuel Lawrence, Piga, Leonardo, Paul, Indrani, Huang, Wei, & Arora, Manish. Managing variations among nodes in parallel system frameworks. United States.
Wasmundt, Samuel Lawrence, Piga, Leonardo, Paul, Indrani, Huang, Wei, and Arora, Manish. Tue . "Managing variations among nodes in parallel system frameworks". United States. https://www.osti.gov/servlets/purl/1568617.
@article{osti_1568617,
title = {Managing variations among nodes in parallel system frameworks},
author = {Wasmundt, Samuel Lawrence and Piga, Leonardo and Paul, Indrani and Huang, Wei and Arora, Manish},
abstractNote = {Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {7}
}

Patent:

Save / Share: