skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture

Abstract

Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) { on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1194293
Report Number(s):
PNNL-SA-108596
400470000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 12th ACM International Conference on Computing Frontiers (CF 2015), May 18-21, 2015, Ischia, Italy, Article No. 12
Country of Publication:
United States
Language:
English
Subject:
irregular applications, energy opttimization, many-core processors, data layouts

Citation Formats

Chavarría-Miranda, Daniel, Panyala, Ajay R., Halappanavar, Mahantesh, Manzano Franco, Joseph B., and Tumeo, Antonino. Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture. United States: N. p., 2015. Web. doi:10.1145/2742854.2742865.
Chavarría-Miranda, Daniel, Panyala, Ajay R., Halappanavar, Mahantesh, Manzano Franco, Joseph B., & Tumeo, Antonino. Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture. United States. doi:10.1145/2742854.2742865.
Chavarría-Miranda, Daniel, Panyala, Ajay R., Halappanavar, Mahantesh, Manzano Franco, Joseph B., and Tumeo, Antonino. Wed . "Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture". United States. doi:10.1145/2742854.2742865.
@article{osti_1194293,
title = {Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture},
author = {Chavarría-Miranda, Daniel and Panyala, Ajay R. and Halappanavar, Mahantesh and Manzano Franco, Joseph B. and Tumeo, Antonino},
abstractNote = {Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) { on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.},
doi = {10.1145/2742854.2742865},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: