skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: POET: Parameterized Optimization for Empirical Tuning

Abstract

The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observation motivates the recent interest in performance tuning using empirical techniques. We present a new embedded scripting language, POET (Parameterized Optimization for Empirical Tuning), for parameterizing complex code transformations so that they can be empirically tuned. The POET language aims to significantly improve the generality, flexibility, and efficiency of existing empirical tuning systems. We have used the language to parameterize and to empirically tune three loop optimizations-interchange, blocking, and unrolling-for two linear algebra kernels. We show experimentally that the time required to tune these optimizations using POET, which does not require any program analysis, is significantly shorter than that when using a full compiler-based source-code optimizer which performs sophisticated program analysis and optimizations.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
908082
Report Number(s):
UCRL-PROC-227558
TRN: US200722%%408
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Conference
Resource Relation:
Conference: Presented at: IPDPS Workshop on Performance Optimization of High-Level Languages and Libraries, Long Beach, CA, United States, Mar 30 - Mar 30, 2007
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGEBRA; EFFICIENCY; FLEXIBILITY; KERNELS; OPTIMIZATION; PERFORMANCE; TRANSFORMATIONS; TUNING; COMPUTER ARCHITECTURE

Citation Formats

Yi, Q, Seymour, K, You, H, Vuduc, R, and Quinlan, D. POET: Parameterized Optimization for Empirical Tuning. United States: N. p., 2007. Web.
Yi, Q, Seymour, K, You, H, Vuduc, R, & Quinlan, D. POET: Parameterized Optimization for Empirical Tuning. United States.
Yi, Q, Seymour, K, You, H, Vuduc, R, and Quinlan, D. Mon . "POET: Parameterized Optimization for Empirical Tuning". United States. doi:. https://www.osti.gov/servlets/purl/908082.
@article{osti_908082,
title = {POET: Parameterized Optimization for Empirical Tuning},
author = {Yi, Q and Seymour, K and You, H and Vuduc, R and Quinlan, D},
abstractNote = {The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observation motivates the recent interest in performance tuning using empirical techniques. We present a new embedded scripting language, POET (Parameterized Optimization for Empirical Tuning), for parameterizing complex code transformations so that they can be empirically tuned. The POET language aims to significantly improve the generality, flexibility, and efficiency of existing empirical tuning systems. We have used the language to parameterize and to empirically tune three loop optimizations-interchange, blocking, and unrolling-for two linear algebra kernels. We show experimentally that the time required to tune these optimizations using POET, which does not require any program analysis, is significantly shorter than that when using a full compiler-based source-code optimizer which performs sophisticated program analysis and optimizations.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 29 00:00:00 EST 2007},
month = {Mon Jan 29 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Auto-tuning has emerged as an important practical method for creating highly optimized implementations of key computational kernels and applications. However, the growing complexity of architectures and applications is creating new challenges for auto-tuning. Complex applications can involve a prohibitively large search space that precludes empirical auto-tuning. Similarly, architectures are becoming increasingly complicated, making it hard to model performance. In this paper, we focus on the challenge to auto-tuning presented by applications with a large number of kernels and kernel instantiations. While these kernels may share a somewhat similar pattern, they differ considerably in problem sizes and the exact computation performed.more » We propose and evaluate a new approach to auto-tuning which we refer to as parameterized micro-benchmarking. It is an alternative to the two existing classes of approaches to auto-tuning: analytical model-based and empirical search-based. Particularly, we argue that the former may not be able to capture all the architectural features that impact performance, whereas the latter might be too expensive for an application that has several different kernels. In our approach, different expressions in the application, different possible implementations of each expression, and the key architectural features, are used to derive a simple micro-benchmark and a small parameter space. This allows us to learn the most significant features of the architecture that can impact the choice of implementation for each kernel. We have evaluated our approach in the context of GPU implementations of tensor contraction expressions encountered in excited state calculations in quantum chemistry. We have focused on two aspects of GPUs that affect tensor contraction execution: memory access patterns and kernel consolidation. Using our parameterized micro-benchmarking approach, we obtain a speedup of up to 2 over the version that used default optimizations, but no auto-tuning. We demonstrate that observations made from microbenchmarks match the behavior seen from real expressions. In the process, we make important observations about the memory hierarchy of two of the most recent NVIDIA GPUs, which can be used in other optimization frameworks as well.« less
  • An intelligent parallel regulator is proposed for autotuning industrial controllers. The proposed controller uses a fixed parameters controller in parallel with a parametrically optimized adaptive self tuning regulator. A Decision Block performs the logical and functional paralleling of the two controllers to achieve a desired performance. Example results will be presented which show that the proposed technique outperforms the fixed parameters conventional controllers both in dynamic performances and reliability.
  • This report summarizes our effort and results of building an integrated optimization environment to effectively combine the programmable control and the empirical tuning of source-to-source compiler optimizations within the framework of multiple existing languages, specifically C, C++, and Fortran. The environment contains two main components: the ROSE analysis engine, which is based on the ROSE C/C++/Fortran2003 source-to-source compiler developed by Co-PI Dr.Quinlan et. al at DOE/LLNL, and the POET transformation engine, which is based on an interpreted program transformation language developed by Dr. Yi at University of Texas at San Antonio (UTSA). The ROSE analysis engine performs advanced compiler analysis,more » identifies profitable code transformations, and then produces output in POET, a language designed to provide programmable control of compiler optimizations to application developers and to support the parameterization of architecture-sensitive optimizations so that their configurations can be empirically tuned later. This POET output can then be ported to different machines together with the user application, where a POET-based search engine empirically reconfigures the parameterized optimizations until satisfactory performance is found. Computational specialists can write POET scripts to directly control the optimization of their code. Application developers can interact with ROSE to obtain optimization feedback as well as provide domain-specific knowledge and high-level optimization strategies. The optimization environment is expected to support different levels of automation and programmer intervention, from fully-automated tuning to semi-automated development and to manual programmable control.« less
  • Although automated empirical performance optimization and tuning is well-studied for kernels and domain-specific libraries, a current research grand challenge is how to extend these methodologies and tools to significantly larger sequential and parallel applications. In this context, we present the ROSE source-to-source outliner, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. Our outliner aims to handle large scale C/C++, Fortran and OpenMP applications. A set of program analysis and transformation techniques are utilized to enhance the portability, scalability,more » and interoperability of source-to-source outlining. More importantly, the generated kernels preserve performance characteristics of tuning targets and can be easily handled by other tools. Preliminary evaluations have shown that the ROSE outliner serves as a key component within an end-to-end empirical optimization system and enables a wide range of sequential and parallel optimization opportunities.« less