skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Autotuning in High-Performance Computing Applications

Journal Article · · Proceedings of the IEEE
 [1]; ORCiD logo [2];  [3]; ORCiD logo [4];  [5];  [6];  [7]
  1. Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Univ. of Tennessee, Knoxville, TN (United States)
  3. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  4. Univ. of Utah, Salt Lake City, UT (United States)
  5. Univ. of Maryland, College Park, MD (United States)
  6. Univ. of Oregon, Eugene, OR (United States)
  7. Georgia Inst. of Technology, Atlanta, GA (United States)

Autotuning refers to the automatic generation of a search space of possible implementations of a computation that are evaluated through models and/or empirical measurement to identify the most desirable implementation. Autotuning has the potential to dramatically improve the performance portability of petascale and exascale applications. To date, autotuning has been used primarily in high-performance applications through tunable libraries or previously tuned application code that is integrated directly into the application. This paper draws on the authors' extensive experience applying autotuning to high-performance applications, describing both successes and future challenges. If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures. In particular, tools that configure the application must be integrated into the application build process so that tuning can be reapplied as the application and target architectures evolve.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-06CH11357; AC52-07NA27344
OSTI ID:
1488544
Alternate ID(s):
OSTI ID: 1868859
Report Number(s):
LLNL-JRNL-834240; 147743
Journal Information:
Proceedings of the IEEE, Vol. 106, Issue 11; ISSN 0018-9219
Publisher:
Institute of Electrical and Electronics EngineersCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 54 works
Citation information provided by
Web of Science

References (58)

Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
  • Aktulga, Hasan Metin; Buluc, Aydin; Williams, Samuel
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.125
conference May 2014
Generating Efficient Tensor Contractions for GPUs conference September 2015
Speeding up Nek5000 with autotuning and specialization conference January 2010
Exploiting Performance Portability in Search Algorithms for Autotuning conference May 2016
Architecture-Adaptive Code Variant Tuning
  • Muralidharan, Saurav; Roy, Amit; Hall, Mary
  • ASPLOS '16: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/2872362.2872411
conference March 2016
Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT journal May 2016
The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015
Autotuning algorithmic choice for input sensitivity
  • Ding, Yufei; Ansel, Jason; Veeramachaneni, Kalyan
  • PLDI '15: ACM SIGPLAN Conference on Programming Language Design and Implementation, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation https://doi.org/10.1145/2737924.2737969
conference June 2015
Stencil-Aware GPU Optimization of Iterative Solvers journal January 2013
Autotuning Stencil-Based Computations on GPUs conference September 2012
Nitro: A Framework for Adaptive Code Variant Tuning
  • Muralidharan, Saurav; Shantharam, Manu; Hall, Mary
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.59
conference May 2014
A tuning framework for software-managed memory hierarchies
  • Ren, Manman; Park, Ji Young; Houston, Mike
  • Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08 https://doi.org/10.1145/1454115.1454155
conference January 2008
PetaBricks: a language and compiler for algorithmic choice conference January 2009
Xevolver: An XML-based code translation framework for supporting HPC application migration conference December 2014
Can search algorithms save large-scale automatic performance tuning? journal January 2011
Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries book January 1997
Machine learning for predictive auto-tuning with boosted regression trees conference May 2012
Annotation-based empirical performance tuning using Orio conference May 2009
A scalable auto-tuning framework for compiler optimization conference May 2009
An overview of the Trilinos project journal September 2005
Lighthouse: a taxonomy-based solver selection tool
  • Sood, Kanika; Norris, Boyana; Jessup, Elizabeth
  • SPLASH '15: Conference on Systems, Programming, Languages, and Applications: Software for Humanity, Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems https://doi.org/10.1145/2837476.2837485
conference October 2015
POET: Parameterized Optimizations for Empirical Tuning conference March 2007
Lighthouse: an automated solver selection tool
  • Motter, Pate; Sood, Kanika; Jessup, Elizabeth
  • SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering https://doi.org/10.1145/2830168.2830169
conference November 2015
Performance-Based Numerical Solver Selection in the Lighthouse Framework journal January 2016
Auto-tuning full applications: A case study journal June 2011
Caliper: Performance Introspection for HPC Software Stacks
  • Boehme, David; Gamblin, Todd; Beckingsale, David
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.46
conference November 2016
Dynamic program instrumentation for scalable performance tools conference January 1994
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
  • Christen, Matthias; Schenk, Olaf; Burkhart, Helmar
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.70
conference May 2011
Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology conference January 2014
A fast Fourier transform compiler conference January 1999
Optimization of sparse matrix–vector multiplication on emerging multicore platforms journal March 2009
Parallel Parameter Tuning for Applications with Performance Variability conference January 2005
Automatic tuning of whole applications using direct search and a performance-based transformation system journal May 2006
Combined selection of tile sizes and unroll factors using iterative compilation
  • Kisuki, T.; Knijnenburg, P. M. W.; O'Boyle, M. F. P.
  • Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622) https://doi.org/10.1109/PACT.2000.888348
conference January 2000
Application-tailored linear algebra algorithms: A search-based approach journal July 2013
Online Adaptive Code Generation and Tuning
  • Tiwari, Ananta; Hollingsworth, Jeffrey K.
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.86
conference May 2011
OSKI: A library of automatically tuned sparse matrix kernels journal January 2005
A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems conference May 2015
Model-Driven Sparse CP Decomposition for Higher-Order Tensors conference May 2017
FFTW: an adaptive software architecture for the FFT
  • Frigo, M.; Johnson, S. G.
  • 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) https://doi.org/10.1109/ICASSP.1998.681704
conference January 1998
The Design and Implementation of FFTW3 journal February 2005
The pochoir stencil compiler conference January 2011
Basic Linear Algebra Subprograms for Fortran Usage journal September 1979
Design and Implementation of a Parallel Performance Data Management Framework conference January 2005
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology conference January 1997
A set of level 3 basic linear algebra subprograms journal March 1990
Autotuning GEMM Kernels for the Fermi GPU journal November 2012
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors journal February 2009
SPIRAL: Code Generation for DSP Transforms journal February 2005
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
  • Ragan-Kelley, Jonathan; Barnes, Connelly; Adams, Andrew
  • Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13 https://doi.org/10.1145/2491956.2462176
conference January 2013
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms journal September 2009
A Heterogeneous Parallel Framework for Domain-Specific Languages
  • Brown, Kevin J.; Sujeeth, Arvind K.; Lee, Hyouk Joong
  • 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT) https://doi.org/10.1109/PACT.2011.15
conference October 2011
A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs conference January 2006
High-level adaptive program optimization with ADAPT
  • Voss, Michael J.; Eigemann, Rudolf
  • Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming - PPoPP '01 https://doi.org/10.1145/379539.379583
conference January 2001
A comparison of search heuristics for empirical code optimization conference September 2008
LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation
  • Heinecke, Alexander; Henry, Greg; Hutchinson, Maxwell
  • SC16: International Conference for High-Performance Computing, Networking, Storage and Analysis, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/sc.2016.83
conference November 2016
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology conference January 2014
Application-tailored Linear Algebra Algorithms: A search-based Approach preprint January 2012