Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture
Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) { on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1194293
- Report Number(s):
- PNNL-SA-108596; 400470000
- Country of Publication:
- United States
- Language:
- English
Similar Records
Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture
Scaling Graph Community Detection on the Tilera Many-core Architecture
Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms
Journal Article
·
Thu Jun 01 00:00:00 EDT 2017
· Journal of Parallel and Distributed Computing
·
OSTI ID:1347851
Scaling Graph Community Detection on the Tilera Many-core Architecture
Conference
·
Sun Nov 30 23:00:00 EST 2014
·
OSTI ID:1194322
Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms
Conference
·
Thu Jan 31 23:00:00 EST 2008
·
OSTI ID:964372