Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code
Abstract
Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernelsmore »
- Authors:
-
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Computer Science and Artificial Intelligence Lab. (CSAIL)
- Stanford Univ., Palo Alto, CA (United States)
- Adobe, Cambridge, MA (United States)
- Google, Cambridge, MA (United States)
- Publication Date:
- Research Org.:
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Computer Science and Artificial Intelligence Lab. (CSAIL)
- Sponsoring Org.:
- USDOE Office of Science (SC); Defense Advanced Research Projects Agency (DARPA)
- OSTI Identifier:
- 1457399
- Grant/Contract Number:
- SC0005288; SC0008923
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM SIGPLAN Notices
- Additional Journal Information:
- Journal Volume: 2015; Journal ID: ISSN 0362-1340
- Publisher:
- ACM
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Mendis, Charith, Bosboom, Jeffrey, Wu, Kevin, Kamil, Shoaib, Ragan-Kelley, Jonathan, Paris, Sylvain, Zhao, Qin, and Amarasinghe, Saman. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code. United States: N. p., 2015.
Web. doi:10.1145/2737924.2737974.
Mendis, Charith, Bosboom, Jeffrey, Wu, Kevin, Kamil, Shoaib, Ragan-Kelley, Jonathan, Paris, Sylvain, Zhao, Qin, & Amarasinghe, Saman. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code. United States. https://doi.org/10.1145/2737924.2737974
Mendis, Charith, Bosboom, Jeffrey, Wu, Kevin, Kamil, Shoaib, Ragan-Kelley, Jonathan, Paris, Sylvain, Zhao, Qin, and Amarasinghe, Saman. Wed .
"Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code". United States. https://doi.org/10.1145/2737924.2737974. https://www.osti.gov/servlets/purl/1457399.
@article{osti_1457399,
title = {Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code},
author = {Mendis, Charith and Bosboom, Jeffrey and Wu, Kevin and Kamil, Shoaib and Ragan-Kelley, Jonathan and Paris, Sylvain and Zhao, Qin and Amarasinghe, Saman},
abstractNote = {Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernels from Irfan View, leading to 4.97 x performance, and one stencil from the mini GMG multigrid benchmark netting a 4.25 x improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12 x speedup without affecting the user experience.},
doi = {10.1145/2737924.2737974},
journal = {ACM SIGPLAN Notices},
number = ,
volume = 2015,
place = {United States},
year = {Wed Jun 03 00:00:00 EDT 2015},
month = {Wed Jun 03 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Integrating profile-driven parallelism detection and machine-learning-based mapping
journal, February 2014
- Wang, Zheng; Tournavitis, Georgios; Franke, Björn
- ACM Transactions on Architecture and Code Optimization, Vol. 11, Issue 1
An auto-tuning framework for parallel multicore stencil computations
conference, April 2010
- Kamil, Shoaib; Chan, Cy; Oliker, Leonid
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
SmartDec: Approaching C++ Decompilation
conference, October 2011
- Fokin, Alexander; Derevenetc, Egor; Chernov, Alexander
- 2011 18th Working Conference on Reverse Engineering (WCRE)
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
conference, January 2013
- Ragan-Kelley, Jonathan; Barnes, Connelly; Adams, Andrew
- Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13
S2E: a platform for in-vivo multi-path analysis of software systems
conference, January 2011
- Chipounov, Vitaly; Kuznetsov, Volodymyr; Candea, George
- Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11
Transparent dynamic instrumentation
journal, September 2012
- Bruening, Derek; Zhao, Qin; Amarasinghe, Saman
- ACM SIGPLAN Notices, Vol. 47, Issue 7
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching
conference, January 2002
- Wu, Youfeng
- Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
Scalable variable and data type detection in a binary rewriter
journal, June 2013
- ElWazeer, Khaled; Anand, Kapil; Kotha, Aparna
- ACM SIGPLAN Notices, Vol. 48, Issue 6
Analyzing Memory Accesses in x86 Executables
book, January 2004
- Balakrishnan, Gogul; Reps, Thomas
- Lecture Notes in Computer Science
A compiler-level intermediate representation based binary analysis and rewriting system
conference, January 2013
- Anand, Kapil; Smithson, Matthew; Elwazeer, Khaled
- Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys '13
Transparent dynamic instrumentation
conference, January 2012
- Bruening, Derek; Zhao, Qin; Amarasinghe, Saman
- Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments - VEE '12
An Approach to the Problem of Detranslation of Computer Programs
journal, August 1980
- Horspool, R. N.; Marovac, N.
- The Computer Journal, Vol. 23, Issue 3
The Paralax infrastructure: automatic parallelization with a helping hand
conference, January 2010
- Vandierendonck, Hans; Rul, Sean; De Bosschere, Koen
- Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10
Dynamo: a transparent dynamic optimization system
conference, January 2000
- Bala, Vasanth; Duesterwald, Evelyn; Banerjia, Sanjeev
- Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation - PLDI '00
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching
journal, May 2002
- Wu, Youfeng
- ACM SIGPLAN Notices, Vol. 37, Issue 5
Reverse engineering of binary device drivers with RevNIC
conference, January 2010
- Chipounov, Vitaly; Candea, George
- Proceedings of the 5th European conference on Computer systems - EuroSys '10
Scalable variable and data type detection in a binary rewriter
conference, January 2013
- ElWazeer, Khaled; Anand, Kapil; Kotha, Aparna
- Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13
Automatic Parallelization in a Binary Rewriter
conference, December 2010
- Kotha, Aparna; Anand, Kapil; Smithson, Matthew
- 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching
conference, January 2002
- Wu, Youfeng
- Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
Reviewers
conference, December 2007
- ,
- 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007)
OpenTuner: an extensible framework for program autotuning
conference, January 2014
- Ansel, Jason; Kamil, Shoaib; Veeramachaneni, Kalyan
- Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14
Optimization of geometric multigrid for emerging multi- and manycore processors
conference, November 2012
- Williams, Samuel; Kalamkar, Dhiraj D.; Singh, Amik
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
HELIX: automatic parallelization of irregular programs for chip multiprocessing
conference, January 2012
- Campanoni, Simone; Jones, Timothy; Holloway, Glenn
- Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
Practical and Accurate Low-Level Pointer Analysis
conference, March 2005
- Guo, Bolei; Bridges, M. J.; Triantafyllis, S.
- International Symposium on Code Generation and Optimization
Valgrind: a framework for heavyweight dynamic binary instrumentation
conference, January 2007
- Nethercote, Nicholas; Seward, Julian
- Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation - PLDI '07
A framework for enhancing data reuse via associative reordering
conference, January 2013
- Stock, Kevin; Kong, Martin; Grosser, Tobias
- Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI '14
The pochoir stencil compiler
conference, January 2011
- Tang, Yuan; Chowdhury, Rezaul Alam; Kuszmaul, Bradley C.
- Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11
Dynamo: a transparent dynamic optimization system
journal, May 2011
- Bala, Vasanth; Duesterwald, Evelyn; Banerjia, Sanjeev
- ACM SIGPLAN Notices, Vol. 46, Issue 4
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
journal, June 2013
- Ragan-Kelley, Jonathan; Barnes, Connelly; Adams, Andrew
- ACM SIGPLAN Notices, Vol. 48, Issue 6
Works referencing / citing this record:
Verified lifting of stencil computations
journal, August 2016
- Kamil, Shoaib; Cheung, Alvin; Itzhaky, Shachar
- ACM SIGPLAN Notices, Vol. 51, Issue 6
Verified lifting of stencil computations
conference, January 2016
- Kamil, Shoaib; Cheung, Alvin; Itzhaky, Shachar
- Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2016
Trace-based affine reconstruction of codes
conference, January 2016
- Rodríguez, Gabriel; Andión, José M.; Kandemir, Mahmut T.
- Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016