skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Institute for Sustained Performance, Energy, and Resilience (SuPER)

Abstract

The University of Tennessee (UTK) and University of Texas at El Paso (UTEP) partnership supported the three main thrusts of the SUPER project---performance, energy, and resilience. The UTK-UTEP effort thus helped advance the main goal of SUPER, which was to ensure that DOE's computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems. This goal is being met by providing application scientists with strategies and tools to productively maximize performance, conserve energy, and attain resilience. The primary vehicle through which UTK provided performance measurement support to SUPER and the larger HPC community is the Performance Application Programming Interface (PAPI). PAPI is an ongoing project that provides a consistent interface and methodology for collecting hardware performance information from various hardware and software components, including most major CPUs, GPUs and accelerators, interconnects, I/O systems, and power interfaces, as well as virtual cloud environments. The PAPI software is widely used for performance modeling of scientific and engineering applications---for example, the HOMME (High Order Methods Modeling Environment) climate code, and the GAMESS and NWChem computational chemistry codes---on DOE supercomputers. PAPI is widely deployed as middleware for use by higher-level profiling, tracing, and sampling tools (e.g., CrayPat, HPCToolkit, Scalasca, Score-P, TAU,more » Vampir, PerfExpert), making it the de facto standard for hardware counter analysis. PAPI has established itself as fundamental software infrastructure in every application domain (spanning academia, government, and industry), where improving performance can be mission critical. Ultimately, as more application scientists migrate their applications to HPC platforms, they will benefit from the extended capabilities this grant brought to PAPI to analyze and optimize performance in these environments, whether they use PAPI directly, or via third-party performance tools. Capabilities added to PAPI through this grant include support for new architectures such as the lastest GPU and Xeon Phi accelerators, and advanced power measurement and management features. Another important topic for the UTK team was providing support for a rich ecosystem of different fault management strategies in the context of parallel computing. Our long term efforts have been oriented toward proposing flexible strategies and providing building boxes that application developers can use to build the most efficient fault management technique for their application. These efforts span across the entire software spectrum, from theoretical models of existing strategies to easily assess their performance, to algorithmic modifications to take advantage of specific mathematical properties for data redundancy and to extensions to widely used programming paradigms to empower the application developers to deal with all types of faults. We have also continued our tight collaborations with users to help them adopt these technologies to ensure their application always deliver meaningful scientific data. Large supercomputer systems are becoming more and more power and energy constrained, and future systems and applications running on them will need to be optimized to run under power caps and/or minimize energy consumption. The UTEP team contributed to the SUPER energy thrust by developing power modeling methodologies and investigating power management strategies. Scalability modeling results showed that some applications can scale better with respect to an increasing power budget than with respect to only the number of processors. Power management, in particular shifting power to processors on the critical path of an application execution, can reduce perturbation due to system noise and other sources of runtime variability, which are growing problems on large-scale power-constrained computer systems.« less

Authors:
 [1];  [1];  [1];  [1];  [2]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. Univ. of Texas, El Paso, TX (United States)
Publication Date:
Research Org.:
Univ. of Tennessee, Knoxville, TN (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
Univ. of Texas, El Paso, TX (United States)
OSTI Identifier:
1333889
Report Number(s):
DOE-UTK-UTEP-6733-1
DOE Contract Number:  
SC0006733
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 42 ENGINEERING; Performance Analysis; PAPI; Power monitoring; Power Capping, Resilience; Performance Counters

Citation Formats

Jagode, Heike, Bosilca, George, Danalis, Anthony, Dongarra, Jack, and Moore, Shirley. Institute for Sustained Performance, Energy, and Resilience (SuPER). United States: N. p., 2016. Web. doi:10.2172/1333889.
Jagode, Heike, Bosilca, George, Danalis, Anthony, Dongarra, Jack, & Moore, Shirley. Institute for Sustained Performance, Energy, and Resilience (SuPER). United States. doi:10.2172/1333889.
Jagode, Heike, Bosilca, George, Danalis, Anthony, Dongarra, Jack, and Moore, Shirley. Wed . "Institute for Sustained Performance, Energy, and Resilience (SuPER)". United States. doi:10.2172/1333889. https://www.osti.gov/servlets/purl/1333889.
@article{osti_1333889,
title = {Institute for Sustained Performance, Energy, and Resilience (SuPER)},
author = {Jagode, Heike and Bosilca, George and Danalis, Anthony and Dongarra, Jack and Moore, Shirley},
abstractNote = {The University of Tennessee (UTK) and University of Texas at El Paso (UTEP) partnership supported the three main thrusts of the SUPER project---performance, energy, and resilience. The UTK-UTEP effort thus helped advance the main goal of SUPER, which was to ensure that DOE's computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems. This goal is being met by providing application scientists with strategies and tools to productively maximize performance, conserve energy, and attain resilience. The primary vehicle through which UTK provided performance measurement support to SUPER and the larger HPC community is the Performance Application Programming Interface (PAPI). PAPI is an ongoing project that provides a consistent interface and methodology for collecting hardware performance information from various hardware and software components, including most major CPUs, GPUs and accelerators, interconnects, I/O systems, and power interfaces, as well as virtual cloud environments. The PAPI software is widely used for performance modeling of scientific and engineering applications---for example, the HOMME (High Order Methods Modeling Environment) climate code, and the GAMESS and NWChem computational chemistry codes---on DOE supercomputers. PAPI is widely deployed as middleware for use by higher-level profiling, tracing, and sampling tools (e.g., CrayPat, HPCToolkit, Scalasca, Score-P, TAU, Vampir, PerfExpert), making it the de facto standard for hardware counter analysis. PAPI has established itself as fundamental software infrastructure in every application domain (spanning academia, government, and industry), where improving performance can be mission critical. Ultimately, as more application scientists migrate their applications to HPC platforms, they will benefit from the extended capabilities this grant brought to PAPI to analyze and optimize performance in these environments, whether they use PAPI directly, or via third-party performance tools. Capabilities added to PAPI through this grant include support for new architectures such as the lastest GPU and Xeon Phi accelerators, and advanced power measurement and management features. Another important topic for the UTK team was providing support for a rich ecosystem of different fault management strategies in the context of parallel computing. Our long term efforts have been oriented toward proposing flexible strategies and providing building boxes that application developers can use to build the most efficient fault management technique for their application. These efforts span across the entire software spectrum, from theoretical models of existing strategies to easily assess their performance, to algorithmic modifications to take advantage of specific mathematical properties for data redundancy and to extensions to widely used programming paradigms to empower the application developers to deal with all types of faults. We have also continued our tight collaborations with users to help them adopt these technologies to ensure their application always deliver meaningful scientific data. Large supercomputer systems are becoming more and more power and energy constrained, and future systems and applications running on them will need to be optimized to run under power caps and/or minimize energy consumption. The UTEP team contributed to the SUPER energy thrust by developing power modeling methodologies and investigating power management strategies. Scalability modeling results showed that some applications can scale better with respect to an increasing power budget than with respect to only the number of processors. Power management, in particular shifting power to processors on the critical path of an application execution, can reduce perturbation due to system noise and other sources of runtime variability, which are growing problems on large-scale power-constrained computer systems.},
doi = {10.2172/1333889},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {11}
}