skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture

Journal Article · · Journal of Parallel and Distributed Computing

High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structured locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1347851
Report Number(s):
PNNL-SA-118976; 400470000
Journal Information:
Journal of Parallel and Distributed Computing, Vol. 104; ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English