skip to main content
DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: System, methods and apparatus for program optimization for multi-threaded processor architectures

Abstract

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

Inventors:
; ; ; ; ; ;
Issue Date:
Research Org.:
Reservoir Labs, Inc., New York, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1167014
Patent Number(s):
8,930,926
Application Number:
12/762,281
Assignee:
Reservoir Labs, Inc. (New York, NY)
DOE Contract Number:  
FG02-08ER85149
Resource Type:
Patent
Resource Relation:
Patent File Date: 2010 Apr 16
Country of Publication:
United States
Language:
English
Subject:
42 ENGINEERING

Citation Formats

Bastoul, Cedric, Lethin, Richard A, Leung, Allen K, Meister, Benoit J, Szilagyi, Peter, Vasilache, Nicolas T, and Wohlford, David E. System, methods and apparatus for program optimization for multi-threaded processor architectures. United States: N. p., 2015. Web.
Bastoul, Cedric, Lethin, Richard A, Leung, Allen K, Meister, Benoit J, Szilagyi, Peter, Vasilache, Nicolas T, & Wohlford, David E. System, methods and apparatus for program optimization for multi-threaded processor architectures. United States.
Bastoul, Cedric, Lethin, Richard A, Leung, Allen K, Meister, Benoit J, Szilagyi, Peter, Vasilache, Nicolas T, and Wohlford, David E. Tue . "System, methods and apparatus for program optimization for multi-threaded processor architectures". United States. https://www.osti.gov/servlets/purl/1167014.
@article{osti_1167014,
title = {System, methods and apparatus for program optimization for multi-threaded processor architectures},
author = {Bastoul, Cedric and Lethin, Richard A and Leung, Allen K and Meister, Benoit J and Szilagyi, Peter and Vasilache, Nicolas T and Wohlford, David E},
abstractNote = {Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {1}
}

Patent:

Save / Share:

Works referenced in this record:

An accurate cost model for guiding data locality transformations
journal, September 2005

  • Vera, Xavier; Abella, Jaume; Llosa, Josep
  • ACM Transactions on Programming Languages and Systems, Vol. 27, Issue 5
  • DOI: 10.1145/1086642.1086646

Impact of memory hierarchy on program partitioning and scheduling
conference, January 1995

  • Kaplow, W. K.; Maniatty, W. A.; Szymanski, B. K.
  • Twenty-Eighth Annual Hawaii International Conference on System Sciences, Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences
  • DOI: 10.1109/HICSS.1995.375473

Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies
journal, June 2006

  • Girbal, Sylvain; Vasilache, Nicolas; Bastoul, Cédric
  • International Journal of Parallel Programming, Vol. 34, Issue 3
  • DOI: 10.1007/s10766-006-0012-3

Verifying safety properties of a class of infinite-state distributed algorithms
book, January 1995


Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
conference, January 2000

  • Ahmed, Nawaaz; Mateev, Nikolay; Pingali, Keshav
  • Proceedings of the 14th international conference on Supercomputing - ICS '00
  • DOI: 10.1145/335231.335245

Tiling Imperfectly-nested Loop Nests
conference, January 2000


Efficient string matching: an aid to bibliographic search
journal, June 1975

  • Aho, Alfred V.; Corasick, Margaret J.
  • Communications of the ACM, Vol. 18, Issue 6
  • DOI: 10.1145/360825.360855

Configurable string matching hardware for speeding up intrusion detection
journal, March 2005

  • Aldwairi, Monther; Conte, Thomas; Franzon, Paul
  • ACM SIGARCH Computer Architecture News, Vol. 33, Issue 1
  • DOI: 10.1145/1055626.1055640

Conversion of control dependence to data dependence
conference, January 1983

  • Allen, J. R.; Kennedy, Ken; Porterfield, Carrie
  • Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL '83
  • DOI: 10.1145/567067.567085

Scanning polyhedra with DO loops
journal, July 1991


On the (Im)possibility of Obfuscating Programs
book, January 2001

  • Barak, Boaz; Goldreich, Oded; Impagliazzo, Rusell
  • Advances in Cryptology — CRYPTO 2001
  • DOI: 10.1007/3-540-44647-8_1

A practical automatic polyhedral parallelizer and locality optimizer
journal, May 2008

  • Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
  • ACM SIGPLAN Notices, Vol. 43, Issue 6
  • DOI: 10.1145/1379022.1375595

Automatic mapping of nested loops to FPGAS
conference, January 2007

  • Ramanujam, J.; Sadayappan, P.
  • Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07
  • DOI: 10.1145/1229428.1229446

Scanning polyhedra without Do-loops
conference, January 1998

  • Boulet, P.; Feautrier, P.
  • Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192)
  • DOI: 10.1109/PACT.1998.727127

Effective partial redundancy elimination
conference, January 1994

  • Briggs, Preston; Cooper, Keith D.
  • Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation - PLDI '94
  • DOI: 10.1145/178243.178257

Towards automatic generation of vulnerability-based signatures
conference, January 2006

  • Brumley, D.; Newsome, J.; Song, D.
  • 2006 IEEE Symposium on Security and Privacy (S&P'06)
  • DOI: 10.1109/SP.2006.41

Scaling to the end of silicon with EDGE architectures
journal, July 2004

  • Burger, D.; Keckler, S. W.; McKinley, K. S.
  • Computer, Vol. 37, Issue 7
  • DOI: 10.1109/MC.2004.65

Flow-insensitive interprocedural alias analysis in the presence of pointers
book, June 2005

  • Burke, Michael; Carini, Paul; Choi, Jong-Deok
  • Languages and Compilers for Parallel Computing, p. 234-250
  • DOI: 10.1007/BFb0025882

Automatic memory layout transformations to optimize spatial locality in parameterized loop nests
journal, March 2000

  • Clauss, Philippe; Meister, Benoît
  • ACM SIGARCH Computer Architecture News, Vol. 28, Issue 1
  • DOI: 10.1145/346023.346031

Global code motion/global value numbering
journal, June 1995


A simple graph-based intermediate representation
journal, March 1995


Manufacturing cheap, resilient, and stealthy opaque constructs
conference, January 1998

  • Collberg, Christian; Thomborson, Clark; Low, Douglas
  • Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '98
  • DOI: 10.1145/268946.268962

Operator strength reduction
journal, September 2001

  • Cooper, Keith D.; Simpson, L. Taylor; Vick, Christopher A.
  • ACM Transactions on Programming Languages and Systems, Vol. 23, Issue 5
  • DOI: 10.1145/504709.504710

Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints
conference, January 1977

  • Cousot, Patrick; Cousot, Radhia
  • Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL '77
  • DOI: 10.1145/512950.512973

Lattice-based memory allocation
conference, January 2003

  • Darte, Alain; Schreiber, Rob; Villard, Gilles
  • Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03
  • DOI: 10.1145/951710.951749

Lattice-Based Memory Allocation
journal, October 2005

  • Darte, A.; Schreiber, R.; Villard, G.
  • IEEE Transactions on Computers, Vol. 54, Issue 10
  • DOI: 10.1109/TC.2005.167

Revisiting the decomposition of Karp, Miller and Winograd
conference, January 1995

  • Darte, A.; Vivien, F.
  • Proceedings The International Conference on Application Specific Array Processors
  • DOI: 10.1109/ASAP.1995.522901

Some efficient solutions to the affine scheduling problem. I. One-dimensional time
journal, October 1992

  • Feautrier, Paul
  • International Journal of Parallel Programming, Vol. 21, Issue 5, p. 313-347
  • DOI: 10.1007/BF01407835

Dataflow analysis of array and scalar references
journal, February 1991

  • Feautrier, Paul
  • International Journal of Parallel Programming, Vol. 20, Issue 1, p. 23-53
  • DOI: 10.1007/BF01407931

Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time
journal, December 1992

  • Feautrier, Paul
  • International Journal of Parallel Programming, Vol. 21, Issue 6
  • DOI: 10.1007/BF01379404

The program dependence graph and its use in optimization
journal, July 1987

  • Ferrante, Jeanne; Ottenstein, Karl J.; Warren, Joe D.
  • ACM Transactions on Programming Languages and Systems, Vol. 9, Issue 3
  • DOI: 10.1145/24039.24041

The Z-polyhedral model
conference, January 2007

  • Gupta, Gautam; Rajopadhye, Sanjay
  • Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07
  • DOI: 10.1145/1229428.1229478

Iterated register coalescing
journal, May 1996

  • George, Lal; Appel, Andrew W.
  • ACM Transactions on Programming Languages and Systems, Vol. 18, Issue 3
  • DOI: 10.1145/229542.229546

Cache miss equations: a compiler framework for analyzing and tuning memory behavior
journal, July 1999

  • Ghosh, Somnath; Martonosi, Margaret; Malik, Sharad
  • ACM Transactions on Programming Languages and Systems, Vol. 21, Issue 4
  • DOI: 10.1145/325478.325479

Symbolic array dataflow analysis for array privatization and program parallelization
conference, January 1995

  • Gu, Junjie; Li, Zhiyuan; Lee, Gyungho
  • Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '95
  • DOI: 10.1145/224170.224318

Ultra-fast aliasing analysis using CLA: a million lines of C code in a second
journal, May 2001


Supernode partitioning
conference, January 1988

  • Irigoin, F.; Triolet, R.
  • Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '88
  • DOI: 10.1145/73560.73588

Register tiling in nonrectangular iteration spaces
journal, July 2002

  • Jiménez, Marta; Llabería, José M.; Fernández, Agustín
  • ACM Transactions on Programming Languages and Systems, Vol. 24, Issue 4
  • DOI: 10.1145/567097.567101

Code generation for multiple mappings
conference, January 1994

  • Kelly, W.; Pugh, W.; Rosser, E.
  • Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation
  • DOI: 10.1109/FMPC.1995.380437

Partial dead code elimination
journal, June 1994

  • Knoop, Jens; Rüthing, Oliver; Steffen, Bernhard
  • ACM SIGPLAN Notices, Vol. 29, Issue 6
  • DOI: 10.1145/773473.178256

An experimental evaluation of tiling and shackling for memory hierarchy management
conference, January 1999

  • Kodukula, Induprakas; Pingali, Keshav; Cox, Robert
  • Proceedings of the 13th international conference on Supercomputing - ICS '99
  • DOI: 10.1145/305138.305243

Software pipelining: an effective scheduling technique for VLIW machines
journal, July 1988


Undecidability of static analysis
journal, December 1992

  • Landi, William
  • ACM Letters on Programming Languages and Systems, Vol. 1, Issue 4
  • DOI: 10.1145/161494.161501

A fast algorithm for finding dominators in a flowgraph
journal, January 1979

  • Lengauer, Thomas; Tarjan, Robert Endre
  • ACM Transactions on Programming Languages and Systems, Vol. 1, Issue 1
  • DOI: 10.1145/357062.357071

Blocking and array contraction across arbitrarily nested loops using affine partitioning
journal, July 2001

  • Lim, Amy W.; Liao, Shih-Wei; Lam, Monica S.
  • ACM SIGPLAN Notices, Vol. 36, Issue 7
  • DOI: 10.1145/568014.379586

Maximizing parallelism and minimizing synchronization with affine transforms
conference, January 1997

  • Lim, Amy W.; Lam, Monica S.
  • Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '97
  • DOI: 10.1145/263699.263719

Normalised Givens rotations for recursive least squares processing
conference, January 1995


Array-data flow analysis and its use in array privatization
conference, January 1993

  • Maydan, Dror E.; Amarasinghe, Saman P.; Lam, Monica S.
  • Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '93
  • DOI: 10.1145/158511.158515

Optimal weighted loop fusion for parallel programs
conference, January 1997

  • Megiddo, Nimrod; Sarkar, Vivek
  • Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures - SPAA '97
  • DOI: 10.1145/258492.258520

Generation of Efficient Nested Loops from Polyhedra
journal, October 2000

  • Quillere, Fabien; Rajopadhye, Sanjay; Wilde, Doran
  • International Journal of Parallel Programming, Vol. 28, Issue 5, p. 469-498
  • DOI: 10.1023/A:1007554627716

The mapping of linear recurrence equations on regular arrays
journal, October 1989

  • Quinton, Patrice; van Dongen, Vincent
  • Journal of VLSI signal processing systems for signal, image and video technology, Vol. 1, Issue 2, p. 95-113
  • DOI: 10.1007/BF02477176

Adaptive array beamforming with fixed-point arithmetic matrix inversion using Givens rotations
conference, November 2001


Iterative modulo scheduling: an algorithm for software pipelining loops
conference, January 1994

  • Rau, B. Ramakrishna
  • Proceedings of the 27th annual international symposium on Microarchitecture - MICRO 27
  • DOI: 10.1145/192724.192731

A Geometric Programming Framework for Optimal Multi-Level Tiling
conference, January 2004

  • Renganarayana, L.; Rajopadhye, S.
  • Proceedings of the ACM/IEEE SC2004 Conference
  • DOI: 10.1109/SC.2004.3

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
conference, December 2006

  • Sankaralingam, Karthikeyan; Nagarajan, Ramadass; McDonald, Robert
  • 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)
  • DOI: 10.1109/MICRO.2006.19

Memory optimization by counting points in integer transformations of parametric polytopes
conference, January 2006

  • Seghir, Rachid; Loechner, Vincent
  • Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06
  • DOI: 10.1145/1176760.1176771

A Compiler Framework for Tiling Imperfectly-Nested Loops
book, January 2000

  • Song, Yonghong; Li, Zhiyuan; Goos, Gerhard
  • Languages and Compilers for Parallel Computing
  • DOI: 10.1007/3-540-44905-1_12

Early Control of Register Pressure for Software Pipelined Loops
book, January 2003


Deobfuscation: Reverse Engineering Obfuscated Code
conference, January 2005

  • Udupa, S. K.; Debray, S. K.; Madou, M.
  • 12th Working Conference on Reverse Engineering (WCRE'05)
  • DOI: 10.1109/WCRE.2005.13

Polyhedral Code Generation in the Real World
book, January 2006

  • Vasilache, Nicolas; Bastoul, Cedric; Cohen, Albert
  • Compiler Construction, p. 185-201
  • DOI: 10.1007/11688839_16

Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions
journal, February 2007


Constant propagation with conditional branches
journal, April 1991

  • Wegman, Mark N.; Zadeck, F. Kenneth
  • ACM Transactions on Programming Languages and Systems, Vol. 13, Issue 2
  • DOI: 10.1145/103135.103136

Value dependence graphs: representation without taxation
conference, January 1994

  • Weise, Daniel; Crew, Roger F.; Ernst, Michael
  • Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '94
  • DOI: 10.1145/174675.177907

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages
book, January 2002


A Library for Doing Polyhedral Operations
journal, December 2000


A data locality optimizing algorithm
journal, June 1991


Enabling Loop Fusion and Tiling for Cache Performance by Fixing Fusion-Preventing Data Dependences
conference, January 2005

  • Jingling Xue,
  • 2005 International Conference on Parallel Processing (ICPP'05)
  • DOI: 10.1109/ICPP.2005.37

Static branch frequency and program profile analysis
conference, January 1994

  • Wu, Youfeng; Larus, James R.
  • Proceedings of the 27th annual international symposium on Microarchitecture - MICRO 27
  • DOI: 10.1145/192724.192725

Solution and Optimization of Systems of Pseudo-Boolean Constraints
journal, October 2007

  • Aloul, Fadi A.; Ramani, Arathi; Sakallah, Karem A.
  • IEEE Transactions on Computers, Vol. 56, Issue 10
  • DOI: 10.1109/TC.2007.1075

Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90
journal, March 2001

  • Chang, Rong-Guey; Chuang, Tyng-Ruey; Lee, Jenq Kuen
  • The Journal of Supercomputing, Vol. 18, Issue 3, p. 305-339
  • DOI: 10.1023/A:1008113800183

Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
journal, September 1978

  • Gustavson, Fred G.
  • ACM Transactions on Mathematical Software, Vol. 4, Issue 3
  • DOI: 10.1145/355791.355796

Scalable Tensor Decompositions for Multi-aspect Data Mining
conference, December 2008

  • Kolda, Tamara G.; Sun, Jimeng
  • 2008 Eighth IEEE International Conference on Data Mining (ICDM)
  • DOI: 10.1109/ICDM.2008.89

On the Best Rank-1 and Rank-( R 1 , R 2 ,. . ., R N ) Approximation of Higher-Order Tensors
journal, January 2000

  • De Lathauwer, Lieven; De Moor, Bart; Vandewalle, Joos
  • SIAM Journal on Matrix Analysis and Applications, Vol. 21, Issue 4
  • DOI: 10.1137/S0895479898346995

Efficient data compression methods for multidimensional sparse array operations based on the ekmr scheme
journal, December 2003


Efficient representation scheme for multidimensional array operations
journal, March 2002

  • Chun-Yuan Lin,
  • IEEE Transactions on Computers, Vol. 51, Issue 3
  • DOI: 10.1109/12.990130

Solving SAT and SAT Modulo Theories: From an abstract Davis--Putnam--Logemann--Loveland procedure to DPLL(
journal, November 2006

  • Nieuwenhuis, Robert; Oliveras, Albert; Tinelli, Cesare
  • Journal of the ACM, Vol. 53, Issue 6
  • DOI: 10.1145/1217856.1217859