DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Autotuning in High-Performance Computing Applications

Abstract

Autotuning refers to the automatic generation of a search space of possible implementations of a computation that are evaluated through models and/or empirical measurement to identify the most desirable implementation. Autotuning has the potential to dramatically improve the performance portability of petascale and exascale applications. To date, autotuning has been used primarily in high-performance applications through tunable libraries or previously tuned application code that is integrated directly into the application. This paper draws on the authors' extensive experience applying autotuning to high-performance applications, describing both successes and future challenges. If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures. In particular, tools that configure the application must be integrated into the application build process so that tuning can be reapplied as the application and target architectures evolve.

Authors:
 [1]; ORCiD logo [2];  [3]; ORCiD logo [4];  [5];  [6];  [7]
  1. Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Univ. of Tennessee, Knoxville, TN (United States)
  3. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  4. Univ. of Utah, Salt Lake City, UT (United States)
  5. Univ. of Maryland, College Park, MD (United States)
  6. Univ. of Oregon, Eugene, OR (United States)
  7. Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1488544
Alternate Identifier(s):
OSTI ID: 1868859
Report Number(s):
LLNL-JRNL-834240
Journal ID: ISSN 0018-9219; 147743
Grant/Contract Number:  
AC02-06CH11357; AC52-07NA27344
Resource Type:
Accepted Manuscript
Journal Name:
Proceedings of the IEEE
Additional Journal Information:
Journal Volume: 106; Journal Issue: 11; Journal ID: ISSN 0018-9219
Publisher:
Institute of Electrical and Electronics Engineers
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; high-performance computing; performance tuning programming systems

Citation Formats

Balaprakash, Prasanna, Dongarra, Jack, Gamblin, Todd, Hall, Mary, Hollingsworth, Jeffrey K., Norris, Boyana, and Vuduc, Richard. Autotuning in High-Performance Computing Applications. United States: N. p., 2018. Web. doi:10.1109/JPROC.2018.2841200.
Balaprakash, Prasanna, Dongarra, Jack, Gamblin, Todd, Hall, Mary, Hollingsworth, Jeffrey K., Norris, Boyana, & Vuduc, Richard. Autotuning in High-Performance Computing Applications. United States. https://doi.org/10.1109/JPROC.2018.2841200
Balaprakash, Prasanna, Dongarra, Jack, Gamblin, Todd, Hall, Mary, Hollingsworth, Jeffrey K., Norris, Boyana, and Vuduc, Richard. Tue . "Autotuning in High-Performance Computing Applications". United States. https://doi.org/10.1109/JPROC.2018.2841200. https://www.osti.gov/servlets/purl/1488544.
@article{osti_1488544,
title = {Autotuning in High-Performance Computing Applications},
author = {Balaprakash, Prasanna and Dongarra, Jack and Gamblin, Todd and Hall, Mary and Hollingsworth, Jeffrey K. and Norris, Boyana and Vuduc, Richard},
abstractNote = {Autotuning refers to the automatic generation of a search space of possible implementations of a computation that are evaluated through models and/or empirical measurement to identify the most desirable implementation. Autotuning has the potential to dramatically improve the performance portability of petascale and exascale applications. To date, autotuning has been used primarily in high-performance applications through tunable libraries or previously tuned application code that is integrated directly into the application. This paper draws on the authors' extensive experience applying autotuning to high-performance applications, describing both successes and future challenges. If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures. In particular, tools that configure the application must be integrated into the application build process so that tuning can be reapplied as the application and target architectures evolve.},
doi = {10.1109/JPROC.2018.2841200},
journal = {Proceedings of the IEEE},
number = 11,
volume = 106,
place = {United States},
year = {Tue Jul 31 00:00:00 EDT 2018},
month = {Tue Jul 31 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 54 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
conference, May 2014

  • Aktulga, Hasan Metin; Buluc, Aydin; Williams, Samuel
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.125

Generating Efficient Tensor Contractions for GPUs
conference, September 2015

  • Nelson, Thomas; Rivera, Axel; Balaprakash, Prasanna
  • 2015 44th International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2015.106

Speeding up Nek5000 with autotuning and specialization
conference, January 2010

  • Shin, Jaewook; Hall, Mary W.; Chame, Jacqueline
  • Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
  • DOI: 10.1145/1810085.1810120

Exploiting Performance Portability in Search Algorithms for Autotuning
conference, May 2016

  • Roy, Amit; Balaprakash, Prasanna; Hovland, Paul D.
  • 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2016.85

Architecture-Adaptive Code Variant Tuning
conference, March 2016

  • Muralidharan, Saurav; Roy, Amit; Hall, Mary
  • ASPLOS '16: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
  • DOI: 10.1145/2872362.2872411

Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT
journal, May 2016


The Spack package manager: bringing order to HPC software chaos
conference, January 2015

  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807623

Autotuning algorithmic choice for input sensitivity
conference, June 2015

  • Ding, Yufei; Ansel, Jason; Veeramachaneni, Kalyan
  • PLDI '15: ACM SIGPLAN Conference on Programming Language Design and Implementation, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
  • DOI: 10.1145/2737924.2737969

Stencil-Aware GPU Optimization of Iterative Solvers
journal, January 2013

  • Lowell, Daniel; Godwin, Jeswin; Holewinski, Justin
  • SIAM Journal on Scientific Computing, Vol. 35, Issue 5
  • DOI: 10.1137/120883153

Autotuning Stencil-Based Computations on GPUs
conference, September 2012

  • Mametjanov, Azamat; Lowell, Daniel; Ma, Ching-Chen
  • 2012 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2012.46

Nitro: A Framework for Adaptive Code Variant Tuning
conference, May 2014

  • Muralidharan, Saurav; Shantharam, Manu; Hall, Mary
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.59

A tuning framework for software-managed memory hierarchies
conference, January 2008

  • Ren, Manman; Park, Ji Young; Houston, Mike
  • Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08
  • DOI: 10.1145/1454115.1454155

PetaBricks: a language and compiler for algorithmic choice
conference, January 2009

  • Ansel, Jason; Chan, Cy; Wong, Yee Lok
  • Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09
  • DOI: 10.1145/1542476.1542481

Xevolver: An XML-based code translation framework for supporting HPC application migration
conference, December 2014

  • Takizawa, Hiroyuki; Hirasawa, Shoichi; Hayashi, Yasuharu
  • 2014 21st International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HiPC.2014.7116902

Can search algorithms save large-scale automatic performance tuning?
journal, January 2011


Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries
book, January 1997

  • Balay, Satish; Gropp, William D.; McInnes, Lois Curfman
  • Modern Software Tools for Scientific Computing
  • DOI: 10.1007/978-1-4612-1986-6_8

Machine learning for predictive auto-tuning with boosted regression trees
conference, May 2012


Annotation-based empirical performance tuning using Orio
conference, May 2009

  • Hartono, Albert; Norris, Boyana; Sadayappan, P.
  • Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing
  • DOI: 10.1109/IPDPS.2009.5161004

A scalable auto-tuning framework for compiler optimization
conference, May 2009

  • Tiwari, Ananta; Chen, Chun; Chame, Jacqueline
  • Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing
  • DOI: 10.1109/IPDPS.2009.5161054

An overview of the Trilinos project
journal, September 2005

  • Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.
  • ACM Transactions on Mathematical Software, Vol. 31, Issue 3
  • DOI: 10.1145/1089014.1089021

Lighthouse: a taxonomy-based solver selection tool
conference, October 2015

  • Sood, Kanika; Norris, Boyana; Jessup, Elizabeth
  • SPLASH '15: Conference on Systems, Programming, Languages, and Applications: Software for Humanity, Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems
  • DOI: 10.1145/2837476.2837485

POET: Parameterized Optimizations for Empirical Tuning
conference, March 2007

  • Yi, Qing; Seymour, Keith; You, Haihang
  • 2007 IEEE International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2007.370637

Lighthouse: an automated solver selection tool
conference, November 2015

  • Motter, Pate; Sood, Kanika; Jessup, Elizabeth
  • SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering
  • DOI: 10.1145/2830168.2830169

Performance-Based Numerical Solver Selection in the Lighthouse Framework
journal, January 2016

  • Jessup, Elizabeth; Motter, Pate; Norris, Boyana
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 5
  • DOI: 10.1137/15M1028406

Auto-tuning full applications: A case study
journal, June 2011

  • Tiwari, Ananta; Hollingsworth, Jeffrey K.
  • The International Journal of High Performance Computing Applications, Vol. 25, Issue 3
  • DOI: 10.1177/1094342011414744

Caliper: Performance Introspection for HPC Software Stacks
conference, November 2016

  • Boehme, David; Gamblin, Todd; Beckingsale, David
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2016.46

Dynamic program instrumentation for scalable performance tools
conference, January 1994

  • Hollingsworth, J. K.; Miller, B. P.; Cargille, J.
  • Proceedings of IEEE Scalable High Performance Computing Conference
  • DOI: 10.1109/SHPCC.1994.296728

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
conference, May 2011

  • Christen, Matthias; Schenk, Olaf; Burkhart, Helmar
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2011.70

Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology
conference, January 2014

  • Bilmes, Jeff; Asanovic, Krste; Chin, Chee-Whye
  • 25th Anniversary International Conference on Supercomputing Anniversary Volume -
  • DOI: 10.1145/2591635.2591656

A fast Fourier transform compiler
conference, January 1999

  • Frigo, Matteo
  • Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation - PLDI '99
  • DOI: 10.1145/301618.301661

Optimization of sparse matrix–vector multiplication on emerging multicore platforms
journal, March 2009


Parallel Parameter Tuning for Applications with Performance Variability
conference, January 2005

  • Tabatabaee, V.; Tiwari, A.; Hollingsworth, J. K.
  • ACM/IEEE SC 2005 Conference (SC'05)
  • DOI: 10.1109/SC.2005.52

Automatic tuning of whole applications using direct search and a performance-based transformation system
journal, May 2006

  • Qasem, Apan; Kennedy, Ken; Mellor-Crummey, John
  • The Journal of Supercomputing, Vol. 36, Issue 2
  • DOI: 10.1007/s11227-006-7957-2

Combined selection of tile sizes and unroll factors using iterative compilation
conference, January 2000

  • Kisuki, T.; Knijnenburg, P. M. W.; O'Boyle, M. F. P.
  • Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622)
  • DOI: 10.1109/PACT.2000.888348

Application-tailored linear algebra algorithms: A search-based approach
journal, July 2013

  • Fabregat-Traver, Diego; Bientinesi, Paolo
  • The International Journal of High Performance Computing Applications, Vol. 27, Issue 4
  • DOI: 10.1177/1094342013494428

Online Adaptive Code Generation and Tuning
conference, May 2011

  • Tiwari, Ananta; Hollingsworth, Jeffrey K.
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2011.86

OSKI: A library of automatically tuned sparse matrix kernels
journal, January 2005

  • Vuduc, Richard; Demmel, James W.; Yelick, Katherine A.
  • Journal of Physics: Conference Series, Vol. 16
  • DOI: 10.1088/1742-6596/16/1/071

A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems
conference, May 2015

  • Sao, Piyush; Liu, Xing; Vuduc, Richard
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.104

Model-Driven Sparse CP Decomposition for Higher-Order Tensors
conference, May 2017

  • Li, Jiajia; Choi, Jee; Perros, Ioakeim
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.80

FFTW: an adaptive software architecture for the FFT
conference, January 1998

  • Frigo, M.; Johnson, S. G.
  • 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
  • DOI: 10.1109/ICASSP.1998.681704

The Design and Implementation of FFTW3
journal, February 2005


The pochoir stencil compiler
conference, January 2011

  • Tang, Yuan; Chowdhury, Rezaul Alam; Kuszmaul, Bradley C.
  • Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11
  • DOI: 10.1145/1989493.1989508

Basic Linear Algebra Subprograms for Fortran Usage
journal, September 1979

  • Lawson, C. L.; Hanson, R. J.; Kincaid, D. R.
  • ACM Transactions on Mathematical Software, Vol. 5, Issue 3
  • DOI: 10.1145/355841.355847

Design and Implementation of a Parallel Performance Data Management Framework
conference, January 2005

  • Huck, K. A.; Malony, A. D.; Bell, R.
  • 2005 International Conference on Parallel Processing (ICPP'05)
  • DOI: 10.1109/ICPP.2005.29

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
conference, January 1997

  • Bilmes, Jeff; Asanovic, Krste; Chin, Chee-Whye
  • Proceedings of the 11th international conference on Supercomputing - ICS '97
  • DOI: 10.1145/263580.263662

A set of level 3 basic linear algebra subprograms
journal, March 1990

  • Dongarra, J. J.; Du Croz, Jeremy; Hammarling, Sven
  • ACM Transactions on Mathematical Software, Vol. 16, Issue 1
  • DOI: 10.1145/77626.79170

Autotuning GEMM Kernels for the Fermi GPU
journal, November 2012

  • Kurzak, Jakub; Tomov, Stanimire; Dongarra, Jack
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 23, Issue 11
  • DOI: 10.1109/TPDS.2011.311

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
journal, February 2009

  • Datta, Kaushik; Kamil, Shoaib; Williams, Samuel
  • SIAM Review, Vol. 51, Issue 1
  • DOI: 10.1137/070693199

SPIRAL: Code Generation for DSP Transforms
journal, February 2005


Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
conference, January 2013

  • Ragan-Kelley, Jonathan; Barnes, Connelly; Adams, Andrew
  • Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13
  • DOI: 10.1145/2491956.2462176

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
journal, September 2009

  • Williams, Samuel; Carter, Jonathan; Oliker, Leonid
  • Journal of Parallel and Distributed Computing, Vol. 69, Issue 9
  • DOI: 10.1016/j.jpdc.2009.04.002

A Heterogeneous Parallel Framework for Domain-Specific Languages
conference, October 2011

  • Brown, Kevin J.; Sujeeth, Arvind K.; Lee, Hyouk Joong
  • 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
  • DOI: 10.1109/PACT.2011.15

A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs
conference, January 2006

  • Chung, I. -H.; Hollingsworth, J. K.
  • 2006 15th IEEE International Conference on High Performance Distributed Computing
  • DOI: 10.1109/HPDC.2006.1652135

High-level adaptive program optimization with ADAPT
conference, January 2001

  • Voss, Michael J.; Eigemann, Rudolf
  • Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming - PPoPP '01
  • DOI: 10.1145/379539.379583

Can search algorithms save large-scale automatic performance tuning?
journal, January 2011


A comparison of search heuristics for empirical code optimization
conference, September 2008

  • Seymour, Keith; You, Haihang; Dongarra, Jack
  • 2008 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/clustr.2008.4663803

LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation
conference, November 2016

  • Heinecke, Alexander; Henry, Greg; Hutchinson, Maxwell
  • SC16: International Conference for High-Performance Computing, Networking, Storage and Analysis, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/sc.2016.83

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
conference, January 2014

  • Bilmes, Jeff; Asanovic, Krste; Chin, Chee-Whye
  • 25th Anniversary International Conference on Supercomputing Anniversary Volume -
  • DOI: 10.1145/2591635.2667174

Lighthouse: an automated solver selection tool
conference, November 2015

  • Motter, Pate; Sood, Kanika; Jessup, Elizabeth
  • SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering
  • DOI: 10.1145/2830168.2830169

Lighthouse: a taxonomy-based solver selection tool
conference, October 2015

  • Sood, Kanika; Norris, Boyana; Jessup, Elizabeth
  • SPLASH '15: Conference on Systems, Programming, Languages, and Applications: Software for Humanity, Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems
  • DOI: 10.1145/2837476.2837485

A set of level 3 basic linear algebra subprograms
journal, March 1990

  • Dongarra, J. J.; Du Croz, Jeremy; Hammarling, Sven
  • ACM Transactions on Mathematical Software, Vol. 16, Issue 1
  • DOI: 10.1145/77626.79170

Application-tailored Linear Algebra Algorithms: A search-based Approach
preprint, January 2012