skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parameterizing loop fusion for automated empirical tuning

Abstract

Traditional compilers are limited in their ability to optimize applications for different architectures because statically modeling the effect of specific optimizations on different hardware implementations is difficult. Recent research has been addressing this issue through the use of empirical tuning, which uses trial executions to determine the optimization parameters that are most effective on a particular hardware platform. In this paper, we investigate empirical tuning of loop fusion, an important transformation for optimizing a significant class of real-world applications. In spite of its usefulness, fusion has attracted little attention from previous empirical tuning research, partially because it is much harder to configure than transformations like loop blocking and unrolling. This paper presents novel compiler techniques that extend conventional fusion algorithms to parameterize their output when optimizing a computation, thus allowing the compiler to formulate the entire configuration space for loop fusion using a sequence of integer parameters. The compiler can then employ an external empirical search engine to find the optimal operating point within the space of legal fusion configurations and generate the final optimized code using a simple code transformation system. We have implemented our approach within our compiler infrastructure and conducted preliminary experiments using a simple empirical searchmore » strategy. Our results convey new insights on the interaction of loop fusion with limited hardware resources, such as available registers, while confirming conventional wisdom about the effectiveness of loop fusion in improving application performance.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
890608
Report Number(s):
UCRL-TR-217808
TRN: US200620%%749
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; CONFIGURATION; ENGINES; OPTIMIZATION; PERFORMANCE; SIMULATION; TRANSFORMATIONS; TUNING

Citation Formats

Zhao, Y, Yi, Q, Kennedy, K, Quinlan, D, and Vuduc, R. Parameterizing loop fusion for automated empirical tuning. United States: N. p., 2005. Web. doi:10.2172/890608.
Zhao, Y, Yi, Q, Kennedy, K, Quinlan, D, & Vuduc, R. Parameterizing loop fusion for automated empirical tuning. United States. doi:10.2172/890608.
Zhao, Y, Yi, Q, Kennedy, K, Quinlan, D, and Vuduc, R. Thu . "Parameterizing loop fusion for automated empirical tuning". United States. doi:10.2172/890608. https://www.osti.gov/servlets/purl/890608.
@article{osti_890608,
title = {Parameterizing loop fusion for automated empirical tuning},
author = {Zhao, Y and Yi, Q and Kennedy, K and Quinlan, D and Vuduc, R},
abstractNote = {Traditional compilers are limited in their ability to optimize applications for different architectures because statically modeling the effect of specific optimizations on different hardware implementations is difficult. Recent research has been addressing this issue through the use of empirical tuning, which uses trial executions to determine the optimization parameters that are most effective on a particular hardware platform. In this paper, we investigate empirical tuning of loop fusion, an important transformation for optimizing a significant class of real-world applications. In spite of its usefulness, fusion has attracted little attention from previous empirical tuning research, partially because it is much harder to configure than transformations like loop blocking and unrolling. This paper presents novel compiler techniques that extend conventional fusion algorithms to parameterize their output when optimizing a computation, thus allowing the compiler to formulate the entire configuration space for loop fusion using a sequence of integer parameters. The compiler can then employ an external empirical search engine to find the optimal operating point within the space of legal fusion configurations and generate the final optimized code using a simple code transformation system. We have implemented our approach within our compiler infrastructure and conducted preliminary experiments using a simple empirical search strategy. Our results convey new insights on the interaction of loop fusion with limited hardware resources, such as available registers, while confirming conventional wisdom about the effectiveness of loop fusion in improving application performance.},
doi = {10.2172/890608},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Dec 15 00:00:00 EST 2005},
month = {Thu Dec 15 00:00:00 EST 2005}
}

Technical Report:

Save / Share:
  • This report summarizes our effort and results of building an integrated optimization environment to effectively combine the programmable control and the empirical tuning of source-to-source compiler optimizations within the framework of multiple existing languages, specifically C, C++, and Fortran. The environment contains two main components: the ROSE analysis engine, which is based on the ROSE C/C++/Fortran2003 source-to-source compiler developed by Co-PI Dr.Quinlan et. al at DOE/LLNL, and the POET transformation engine, which is based on an interpreted program transformation language developed by Dr. Yi at University of Texas at San Antonio (UTSA). The ROSE analysis engine performs advanced compiler analysis,more » identifies profitable code transformations, and then produces output in POET, a language designed to provide programmable control of compiler optimizations to application developers and to support the parameterization of architecture-sensitive optimizations so that their configurations can be empirically tuned later. This POET output can then be ported to different machines together with the user application, where a POET-based search engine empirically reconfigures the parameterized optimizations until satisfactory performance is found. Computational specialists can write POET scripts to directly control the optimization of their code. Application developers can interact with ROSE to obtain optimization feedback as well as provide domain-specific knowledge and high-level optimization strategies. The optimization environment is expected to support different levels of automation and programmer intervention, from fully-automated tuning to semi-automated development and to manual programmable control.« less
  • Radiologists miss approximately 25-30% of all pulmonary nodules smaller than 1.0 cm. in mass screenings. This paper describes a system for the automated detection of pulmonary nodules. It aids the radiologist by indicating the sites in the radiograph most likely to be nodules. Procedurally-driven image experts that respond to specific types of anatomic features are incorporated in a pattern recognizer which uses linear discriminant analysis to classify the candidate nodule sites. Sites not classified as nodules are eliminated from the list of sites presented to the radiologist for inspection. This system has been tested on 43 chest radiographs, and hasmore » demonstrated that pattern recognition techniques and procedurally-driven image experts are capable of reducing the number of sites that a radiologist for inspection. This system has been tested on 43 chest radiographs, and has demonstrated that pattern recognition techniques and procedurally-driven image experts are capable of reducing the number of sites that a radiologist must inspect from at most 17 to at most 3 in order to be 99% confident of having inspected any nodule detected by the system that is trained with 37 films.« less
  • An automated cavity tuning procedure has been implemented in the CEBAF control system to tune the superconducting RF (SRF) cavities to their operating frequency of 1497 MHz. The capture range for coarse tuning algorithm (Burst Mode) is more than 20 cavity bandwidths (5 kHz). The fine tuning algorithm (Sweep Mode) calibrates the phase offset in the detuning angle measurement. This paper describes the implementation of these algorithms and experience of their operation in CEBAF control system. 3 refs., 5 figs.