Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture
High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structured locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1347851
- Report Number(s):
- PNNL-SA-118976; 400470000
- Journal Information:
- Journal of Parallel and Distributed Computing, Vol. 104; ISSN 0743-7315
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
Similar Records
Scaling Graph Community Detection on the Tilera Many-core Architecture
Approximate Weighted Matching On Emerging Manycore and Multithreaded Architectures