skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Institute for Sustained Performance, Energy, and Resilience

Abstract

The Green Queue framework is a framework designed to automate the development and deployment of customized architecture- and application-aware power savings recipes for large-scale HPC applications achieving up to 21% and 32% energy savings on HPC production applications run at scale. Additional work focused on the memory sub-system involved our methodology that uses application and machine characterization information to build predictive machine learning models that can accurately quantify phase-level sensitivity to the reduced per core memory bandwidth resulting from changes in the memory bus frequency to reduce the power. We evaluated the predictive capability of the model on real applications and validated them at a fine grain level by looking at 43 individual computational phases or application hotspots as well as the whole application. For more than 91% of the application hotspots, the prediction error is less than 10% (15) . Building from these validated performance and power models collaborations among SUPER team members developed an automated end-to-end system to reduce the complexity of developing and deploying machine learning models for performance, power, and energy. The new framework Automatic Multi-objective Modeling with Machine Learning (AutoMOMML) enabled multi-objective optimizations (power and performance) for HPC workloads.

Authors:
 [1]
  1. University of California, San Diego, CA (United States)
Publication Date:
Research Org.:
Univ. of California, San Diego, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR). Scientific Discovery through Advanced Computing (SciDAC)
OSTI Identifier:
1481285
Report Number(s):
DOE-UCSD-0006620
DE-FG02-11ER26049
DOE Contract Number:  
SC0006620
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; HPC energy efficiency and modeling

Citation Formats

Carrington, Laura. The Institute for Sustained Performance, Energy, and Resilience. United States: N. p., 2019. Web. doi:10.2172/1481285.
Carrington, Laura. The Institute for Sustained Performance, Energy, and Resilience. United States. https://doi.org/10.2172/1481285
Carrington, Laura. 2019. "The Institute for Sustained Performance, Energy, and Resilience". United States. https://doi.org/10.2172/1481285. https://www.osti.gov/servlets/purl/1481285.
@article{osti_1481285,
title = {The Institute for Sustained Performance, Energy, and Resilience},
author = {Carrington, Laura},
abstractNote = {The Green Queue framework is a framework designed to automate the development and deployment of customized architecture- and application-aware power savings recipes for large-scale HPC applications achieving up to 21% and 32% energy savings on HPC production applications run at scale. Additional work focused on the memory sub-system involved our methodology that uses application and machine characterization information to build predictive machine learning models that can accurately quantify phase-level sensitivity to the reduced per core memory bandwidth resulting from changes in the memory bus frequency to reduce the power. We evaluated the predictive capability of the model on real applications and validated them at a fine grain level by looking at 43 individual computational phases or application hotspots as well as the whole application. For more than 91% of the application hotspots, the prediction error is less than 10% (15) . Building from these validated performance and power models collaborations among SUPER team members developed an automated end-to-end system to reduce the complexity of developing and deploying machine learning models for performance, power, and energy. The new framework Automatic Multi-objective Modeling with Machine Learning (AutoMOMML) enabled multi-objective optimizations (power and performance) for HPC workloads.},
doi = {10.2172/1481285},
url = {https://www.osti.gov/biblio/1481285}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Nov 14 00:00:00 EST 2019},
month = {Thu Nov 14 00:00:00 EST 2019}
}