skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture

Abstract

High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structured locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choicesmore » for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.« less

Authors:
ORCiD logo; ; ; ; ORCiD logo
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1347851
Report Number(s):
PNNL-SA-118976
Journal ID: ISSN 0743-7315; 400470000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Journal Article
Resource Relation:
Journal Name: Journal of Parallel and Distributed Computing; Journal Volume: 104
Country of Publication:
United States
Language:
English
Subject:
auto tuning; irregular applications; community detection; sparse conjugate gradient; energy optimization

Citation Formats

Panyala, Ajay, Chavarría-Miranda, Daniel, Manzano, Joseph B., Tumeo, Antonino, and Halappanavar, Mahantesh. Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture. United States: N. p., 2017. Web. doi:10.1016/j.jpdc.2016.06.006.
Panyala, Ajay, Chavarría-Miranda, Daniel, Manzano, Joseph B., Tumeo, Antonino, & Halappanavar, Mahantesh. Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture. United States. doi:10.1016/j.jpdc.2016.06.006.
Panyala, Ajay, Chavarría-Miranda, Daniel, Manzano, Joseph B., Tumeo, Antonino, and Halappanavar, Mahantesh. Thu . "Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture". United States. doi:10.1016/j.jpdc.2016.06.006.
@article{osti_1347851,
title = {Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture},
author = {Panyala, Ajay and Chavarría-Miranda, Daniel and Manzano, Joseph B. and Tumeo, Antonino and Halappanavar, Mahantesh},
abstractNote = {High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structured locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.},
doi = {10.1016/j.jpdc.2016.06.006},
journal = {Journal of Parallel and Distributed Computing},
number = ,
volume = 104,
place = {United States},
year = {Thu Jun 01 00:00:00 EDT 2017},
month = {Thu Jun 01 00:00:00 EDT 2017}
}