Autotuning in High-Performance Computing Applications
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Univ. of Tennessee, Knoxville, TN (United States)
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Univ. of Utah, Salt Lake City, UT (United States)
- Univ. of Maryland, College Park, MD (United States)
- Univ. of Oregon, Eugene, OR (United States)
- Georgia Inst. of Technology, Atlanta, GA (United States)
Autotuning refers to the automatic generation of a search space of possible implementations of a computation that are evaluated through models and/or empirical measurement to identify the most desirable implementation. Autotuning has the potential to dramatically improve the performance portability of petascale and exascale applications. To date, autotuning has been used primarily in high-performance applications through tunable libraries or previously tuned application code that is integrated directly into the application. This paper draws on the authors' extensive experience applying autotuning to high-performance applications, describing both successes and future challenges. If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures. In particular, tools that configure the application must be integrated into the application build process so that tuning can be reapplied as the application and target architectures evolve.
- Research Organization:
- Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-06CH11357; AC52-07NA27344
- OSTI ID:
- 1488544
- Alternate ID(s):
- OSTI ID: 1868859
- Report Number(s):
- LLNL-JRNL-834240; 147743
- Journal Information:
- Proceedings of the IEEE, Vol. 106, Issue 11; ISSN 0018-9219
- Publisher:
- Institute of Electrical and Electronics EngineersCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
|
conference | May 2014 |
Generating Efficient Tensor Contractions for GPUs
|
conference | September 2015 |
Speeding up Nek5000 with autotuning and specialization
|
conference | January 2010 |
Exploiting Performance Portability in Search Algorithms for Autotuning
|
conference | May 2016 |
Architecture-Adaptive Code Variant Tuning
|
conference | March 2016 |
Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT
|
journal | May 2016 |
The Spack package manager: bringing order to HPC software chaos
|
conference | January 2015 |
Autotuning algorithmic choice for input sensitivity
|
conference | June 2015 |
Stencil-Aware GPU Optimization of Iterative Solvers
|
journal | January 2013 |
Autotuning Stencil-Based Computations on GPUs
|
conference | September 2012 |
Nitro: A Framework for Adaptive Code Variant Tuning
|
conference | May 2014 |
A tuning framework for software-managed memory hierarchies
|
conference | January 2008 |
PetaBricks: a language and compiler for algorithmic choice
|
conference | January 2009 |
Xevolver: An XML-based code translation framework for supporting HPC application migration
|
conference | December 2014 |
Can search algorithms save large-scale automatic performance tuning?
|
journal | January 2011 |
Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries
|
book | January 1997 |
Machine learning for predictive auto-tuning with boosted regression trees
|
conference | May 2012 |
Annotation-based empirical performance tuning using Orio
|
conference | May 2009 |
A scalable auto-tuning framework for compiler optimization
|
conference | May 2009 |
An overview of the Trilinos project
|
journal | September 2005 |
Lighthouse: a taxonomy-based solver selection tool
|
conference | October 2015 |
POET: Parameterized Optimizations for Empirical Tuning
|
conference | March 2007 |
Lighthouse: an automated solver selection tool
|
conference | November 2015 |
Performance-Based Numerical Solver Selection in the Lighthouse Framework
|
journal | January 2016 |
Auto-tuning full applications: A case study
|
journal | June 2011 |
Caliper: Performance Introspection for HPC Software Stacks
|
conference | November 2016 |
Dynamic program instrumentation for scalable performance tools
|
conference | January 1994 |
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
|
conference | May 2011 |
Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology
|
conference | January 2014 |
A fast Fourier transform compiler
|
conference | January 1999 |
Optimization of sparse matrix–vector multiplication on emerging multicore platforms
|
journal | March 2009 |
Parallel Parameter Tuning for Applications with Performance Variability
|
conference | January 2005 |
Automatic tuning of whole applications using direct search and a performance-based transformation system
|
journal | May 2006 |
Combined selection of tile sizes and unroll factors using iterative compilation
|
conference | January 2000 |
Application-tailored linear algebra algorithms: A search-based approach
|
journal | July 2013 |
Online Adaptive Code Generation and Tuning
|
conference | May 2011 |
OSKI: A library of automatically tuned sparse matrix kernels
|
journal | January 2005 |
A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems
|
conference | May 2015 |
Model-Driven Sparse CP Decomposition for Higher-Order Tensors
|
conference | May 2017 |
FFTW: an adaptive software architecture for the FFT
|
conference | January 1998 |
The Design and Implementation of FFTW3
|
journal | February 2005 |
The pochoir stencil compiler
|
conference | January 2011 |
Basic Linear Algebra Subprograms for Fortran Usage
|
journal | September 1979 |
Design and Implementation of a Parallel Performance Data Management Framework
|
conference | January 2005 |
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
|
conference | January 1997 |
A set of level 3 basic linear algebra subprograms
|
journal | March 1990 |
Autotuning GEMM Kernels for the Fermi GPU
|
journal | November 2012 |
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
|
journal | February 2009 |
SPIRAL: Code Generation for DSP Transforms
|
journal | February 2005 |
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
|
conference | January 2013 |
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
|
journal | September 2009 |
A Heterogeneous Parallel Framework for Domain-Specific Languages
|
conference | October 2011 |
A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs
|
conference | January 2006 |
High-level adaptive program optimization with ADAPT
|
conference | January 2001 |
A comparison of search heuristics for empirical code optimization
|
conference | September 2008 |
LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation
|
conference | November 2016 |
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
|
conference | January 2014 |
Application-tailored Linear Algebra Algorithms: A search-based Approach | preprint | January 2012 |
Similar Records
Performance Engineering Research Institute SciDAC-2 Enabling Technologies Institute Final Report
Performance Portability of Molecular Docking Miniapp On Leadership Computing Platforms