DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

Journal Article · · Concurrency and Computation. Practice and Experience
DOI: https://doi.org/10.1002/cpe.8322 · OSTI ID:2475676
ORCiD logo [1];  [2];  [1];  [3]; ORCiD logo [1];  [1];  [1];  [4];  [4];  [5]
  1. Argonne National Laboratory (ANL), Argonne, IL (United States)
  2. Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
  3. Hanyang Univ., Seoul (Korea, Republic of)
  4. Intel Corporation, Hillsboro, OR (United States)
  5. Univ. of Utah, Salt Lake City, UT (United States)

As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications—XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale HPC production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP (energy delay product) improvement on up to 4096 nodes.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR). Scientific Discovery through Advanced Computing (SciDAC)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
2475676
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 1 Vol. 37; ISSN 1532-0626; ISSN 1532-0634
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (36)

Automatic performance analysis with periscope journal January 2009
Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications: Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications journal December 2016
Autotuning PolyBench benchmarks with LLVM Clang/Polly loop optimization pragmas using Bayesian optimization journal November 2021
A Framework for Enabling OpenMP Autotuning conference January 2019
Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions conference May 2017
Bayesian Optimization of HPC Systems for Energy Efficiency conference May 2018
A Fourth Order Accurate Finite Difference Scheme for the Elastic Wave Equation in Second Order Formulation journal September 2011
Autotuning Under Tight Budget Constraints: A Transparent Design of Experiments Approach conference May 2019
Minimizing the cost of iterative compilation with active learning conference February 2017
ATF: A Generic Auto-Tuning Framework
  • Rasch, Ari; Haidl, Michael; Gorlatch, Sergei
  • 2017 IEEE 19th International Conference on High Performance Computing and Communications, IEEE 15th International Conference on Smart City and IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.9
conference December 2017
Generating Efficient Tensor Contractions for GPUs conference September 2015
Online Adaptive Code Generation and Tuning
  • Tiwari, Ananta; Hollingsworth, Jeffrey K.
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.86
conference May 2011
Nitro: A Framework for Adaptive Code Variant Tuning
  • Muralidharan, Saurav; Shantharam, Manu; Hall, Mary
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.59
conference May 2014
Exploiting Performance Portability in Search Algorithms for Autotuning conference May 2016
Energy and Power Aware Job Scheduling and Resource Management: Global Survey — Initial Analysis conference May 2018
Autotuning Search Space for Loop Transformations conference November 2020
Standardizing Power Monitoring and Control at Exascale journal October 2016
Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications journal October 2016
Performance and Energy Improvement of ECP Proxy App SW4lite under Various Workloads conference November 2021
CLTune: A Generic Auto-Tuner for OpenCL Kernels conference September 2015
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization conference November 2020
Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations
  • Koo, Jaehoon; Balaprakash, Prasanna; Kruse, Michael
  • 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS54543.2021.00015
conference November 2021
Bayesian Optimization for auto-tuning GPU kernels
  • Willemsen, Floris-Jan; van Nieuwpoort, Rob; van Werkhoven, Ben
  • 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS54543.2021.00017
conference November 2021
Automatically Tuned Linear Algebra Software conference January 1998
Active Harmony: Towards Automated Performance Tuning conference January 2002
Polly — Performing Polyhedral Optimizations on a Low-Level Intermediate Representation journal December 2012
RAPL: memory power estimation and capping
  • David, Howard; Gorbatov, Eugene; Hanebutte, Ulf R.
  • Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10 https://doi.org/10.1145/1840845.1840883
conference January 2010
OpenTuner: an extensible framework for program autotuning
  • Ansel, Jason; Kamil, Shoaib; Veeramachaneni, Kalyan
  • Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14 https://doi.org/10.1145/2628071.2628092
conference January 2014
Performance and Power Characteristics and Optimizations of Hybrid MPI/OpenMP LULESH Miniapps under Various Workloads conference November 2017
Bootstrapping Parameter Space Exploration for Fast Tuning
  • Thiagarajan, Jayaraman J.; Jain, Nikhil; Anirudh, Rushil
  • ICS '18: 2018 International Conference on Supercomputing, Proceedings of the 2018 International Conference on Supercomputing https://doi.org/10.1145/3205289.3205321
conference June 2018
Learning to optimize halide with tree search and random programs journal July 2019
Efficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping
  • Pfaffe, Philip; Grosser, Tobias; Tillmann, Martin
  • ICS '19: 2019 International Conference on Supercomputing, Proceedings of the ACM International Conference on Supercomputing https://doi.org/10.1145/3330345.3330377
conference June 2019
GPTune: multitask learning for autotuning exascale applications
  • Liu, Yang; Sid-Lakhdar, Wissam M.; Marques, Osni
  • PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3437801.3441621
conference February 2021
Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models
  • Roy, Rohan Basu; Patel, Tirthak; Gadepally, Vijay
  • Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation https://doi.org/10.1145/3453483.3454109
conference June 2021
Transfer-learning-based Autotuning using Gaussian Copula conference June 2023
A Strawman for an HPC PowerStack report August 2018

Similar Records

Integrating ytopt and libEnsemble to autotune OpenMC
Journal Article · 2024 · The International Journal of High Performance Computing Applications · OSTI ID:2483787

Autotuning in High-Performance Computing Applications
Journal Article · 2018 · Proceedings of the IEEE · OSTI ID:1488544

Autotuning PolyBench benchmarks with LLVM Clang/Polly loop optimization pragmas using Bayesian optimization
Journal Article · 2021 · Concurrency and Computation. Practice and Experience · OSTI ID:1883233