
- Integer Lattice Based Methods for Local Address Generation for Block-Cyclic
- Proceedings of the 1998 International Conference on Parallel Processing (ICPP 98) Minimizing Data and Synchronization Costs in OneWay Communication
- To appear in Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, February 1995.
- The Science of Programming HighPerformance Linear Algebra Libraries Paolo Bientinesi John A. Gunnels y Fred G. Gustavson y Greg M. Henry z
- A Component Architecture for High-Performance Computing David E. Bernholdt, Wael R. Elwasif, and James A. Kohl
- Improving Offset Assignment on Embedded Processors Using Transformations
- An Efficient CompileTime Approach to Compute Address Sequences in Data Parallel Programs
- AUTOMATIC DATA AND COMPUTATION MAPPING FOR DISTRIBUTED MEMORY MACHINES
- EXACT SCHEDULING TECHNIQUES FOR HIGH LEVEL Submitted to the Graduate Faculty of the
- CompileTime Techniques for Data Distribution in Distributed Memory Machines
- Tiling Multidimensional Iteration Spaces for Multicomputers
- MultiPhase Redistribution: A CommunicationEfficient Approach to Array Redistribution z
- Beyond Unimodular Transformations J. Ramanujam \Lambda
- Final version of paper to appear in IEEE Trans. Parallel and Dist. Systems, Feb 1999 A Linear Algebra Framework for Automatic Determination of
- Proc. 7th International Workshop on Compilers for Parallel Computers, Sweden, June 1998 Optimizing Spatial Locality in Loop Nests using Linear Algebra
- Proc. Workshop on Architecture Compiler Interaction, 3rd HPCA, San Anotnio, Feb. 1997 Optimizing OutofCore Computations in Uniprocessors \Lambda
- Reducing Code Size Through Address Register Assignment
- Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed Memory Machines \Lambda
- Automatic Data Mapping and Program Transformations J. Ramanujam \Lambda and A. Narayan
- Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
- Aiding Library Writers in Optimizing the Use of High-Level Abstractions in Scientific Applications
- Appears in IEEE Trans. on Parallel and Distributed Systems, December 2000 Minimizing data and synchronization costs in oneway
- Enhancing Spatial Locality using Data Layout Optimizations M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy, and P. Banerjee
- A Linear Algebraic View of Loop Transformations and Their Interaction
- Exploiting Domain Specific Highlevel Runtime Support for Parallel Code Generation
- Experiments with Data Layouts M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam
- Automatic Transformations for Communication-Minimized Parallelization and Locality
- Fast Address Sequence Generation for DataParallel Programs Using Integer Lattices
- A MatrixBased Approach to the Global Locality Optimization Problem M. Kandemir \Lambda A. Choudhary y J. Ramanujam z P. Banerjee y
- Compiler Support for Software Libraries Samuel Z. Guyer Calvin Lin
- Effective Automatic Parallelization of Stencil Computations Sriram Krishnamoorthy1
- Generating Parallel Programs for Fast Signal Transforms using SPIRAL
- A Framework for Integrated Communication and I/O Placement
- In Proc. 8th Workshop on Compilers for Parallel Computing (CPC 98), Sweden, June 1998 Advanced Compilation Techniques for HPF
- StatementLevel Independent Partitioning of Uniform Recurrences J. Ramanujam \Lambda S. Vasanthakumar
- Automatic Optimization of Communication in Compiling Outofcore Stencil Rajesh Bordawekar Alok Choudhary
- Nonunimodular Transformations of Nested Loops J. Ramanujam \Lambda
- Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors
- Improving locality in outofcore computations using data layout transformations
- Optimization of OutofCore Computations Using Chain Vectors
- Cluster Partitioning Approaches to Mapping Parallel Programs Onto a Hypercube
- Automatic Performance Tuning and Analysis of Sparse Triangular Solve
- Tiling Multidimensional Iteration Spaces for Nonshared Memory Machines J. Ramanujam P. Sadayappan
- Accepted for publication in IEEE Trans. Computer Aided Design (TCAD), 2000 Compact and Ecient Code Generation through
- A Unified Tiling Approach for OutOfCore Computations \Lambda M. Kandemir
- Generalized Overlap Regions for Communication Optimization in DataParallel Programs
- OPTIMIZING DATA LOCALITY AND PARALLELISM FOR SCALABLE MULTIPROCESSORS
- CODE GENERATION FOR COMPLEX SUBSCRIPTS IN DATAPARALLEL PROGRAMS
- Towards Effective Automatic Parallelization for Multicore Systems Uday Bondhugula1
- Iteration Space Tiling for Distributed Memory Machines J. Ramanujam y and P. Sadayappan z
- Efficient Address Sequence Generation for TwoLevel Mappings in High Performance Fortran
- A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
- MultiPhase Redistribution: A CommunicationEfficient Approach to Array Redistribution z
- A Compiler Algorithm for Optimizing Locality in Loop Nests \Lambda M. Kandemir J. Ramanujam A. Choudhary
- Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
- Data Relation Vectors: A New Abstraction for Data Optimizations Mahmut Kandemir
- A Fast Approach to Computing Exact Solutions to the Resource-Constrained Scheduling Problem
- To appear in Proc. Europar 98, Southampton, UK, September 1998 Enhancing spatial locality via data layout optimizations
- Parallel Processing Letters, c World Scientific Publishing Company
- Parallel Objects: Virtualization and in-Process Components Laxmikant Kale Orion Lawlor Milind Bhandarkar
- IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. Y, MONTH 1999 1 Improving Cache Locality by a Combination of
- A Combined Communication and Synchronization Optimization Algorithm for OneWay Communication
- A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs
- A Neural Architecture for a Class of Abduction Problems Ashok K. Goel \Lambda J. Ramanujam y
- Code Generation for Complex Subscripts in DataParallel Programs
- EFFICIENT ADDRESS AND COMMUNICATION GENERATION FOR DATAPARALLEL
- A Practical Automatic Polyhedral Parallelizer and Locality Optimizer
- A Data Layout Optimization Technique Based on Hyperplanes M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam
- 1990 International Conference on Parallel Processing, Volume II, pages 179186 1 Tiling of Iteration Spaces for Multicomputers
- Task Allocation onto a Hypercube by Recursive Mincut Bipartitioning
- EFFICIENT TECHNIQUES FOR PROBLEMS IN HIGHLEVEL SYNTHESIS
- In Proceedings of the 1998 International Parallel Processing Symposium A Generalized Framework for Global Communication Optimization
- Optimizing Communication Using Global Dataflow Analysis M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy
- A Global Communication Optimization Technique Based on DataFlow Analysis and Linear Algebra
- Tighter lower bounds for scheduling problems in highlevel synthesis
- SPIRAL: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
- Compilation Techniques for OutofCore Parallel Computations \Lambda M. Kandemir A. Choudhary J. Ramanujam R. Bordawekar
- A Performance Optimization Framework for Compilation of Tensor Contraction Expressions into Parallel Programs
- Code Size Optimization for Embedded Processors using Commutative Transformations
- Address Register Assignment for Reducing Code M. Kandemir1
- Address Code and Arithmetic Optimizations for Embedded Systems J. Ramanujam Satish Krishnamurthy Jinpyo Hong Mahmut Kandemir
- Improving Offset Assignment on Embedded Processors using Transformations
- Improving the Computational Performance of ILPbased Problems M. Narasimhan J. Ramanujam
- Improving the Performance of OutofCore Computations \Lambda M. Kandemir y J. Ramanujam z A. Choudhary x
- Lessons learned from the Shared Memory Parallelization of a Functional Array Language
- Simultaneous Peak and Average Power Optimization in Synchronous Sequential Designs Using Retiming and Multiple Supply Voltages
- PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System
- Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed
- Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed
- CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2007; 19:24252443 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02]
- Automatic Mapping of Nested Loops to FPGAs Uday Bondhugula
- Memory Offset Assignment for DSPs Jinpyo Hong1
- An Effective Heuristic for Simple Offset Assignment with Variable Coalescing
- Improving the Energy Behavior of Block Buffering Using Compiler Optimizations
- Automatic Code Generation for Many-Body Electronic Structure Methods: The Tensor
- Dynamic Memory Usage Optimization using ILP and J. Ramanujam2
- ILP and Iterative LP Solutions for Peak and Average Power Optimization in HLS
- Modified Force-Directed Scheduling for Peak and Average Power Optimization using Multiple Supply-Voltages
- Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations
- Performance Modeling and Optimization of Parallel Out-of-Core Tensor Contractions
- IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2004 243 A Compiler-Based Approach for Dynamically
- Memory-Constrained Communication Minimization for a Class of Array Computations
- A Heuristic for Clock Selection in High-Level Synthesis J. Ramanujam Sandeep Deshpande Jinpyo Hong Mahmut Kandemir
- Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints
- A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry
- Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data
- Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
- A Parallel Communication Infrastructure for STAPL Steven Saunders
- Improving Offset Assignment for Embedded Processors Sunil Atri 1 J. Ramanujam 1 Mahmut Kandemir 2
- A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in OutofCore Computations \Lambda
- Mapping Combinatorial Optimization Problems onto Neural Networks
- A Methodology for Parallelizing Programs for Multicomputers and Complex Memory Multiprocessors
- Proc. 11th ACM international Conference on Supercomputing, Melbourne, Australia, July 1998 A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests
- On Lower Bounds for Scheduling Problems in HighLevel Synthesis M. Narasimhan y
- Analysis of Event Synchronization in Parallel Programs J. Ramanujam and A. Mathew
- Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences
- A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs
- Estimating and Reducing the Memory Requirements of Signal Processing Codes for Embedded Systems
- IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 3, MARCH 2004 281 Compiler-Directed Scratch Pad Memory Optimization
- To appear in Proceedings of the Third Workshop on Languages, Compilers and Runtime Systems for Scalable Computers, Troy, NY, May 1995
- Appears in Journal of Parallel and Distributed Computing, September 1999 A MatrixBased Approach to Global Locality Optimization \Lambda
- Address Sequence Generation for DataParallel Programs Using Integer Lattices
- Access based data decomposition for distributed memory machines J. Ramanujam P. Sadayappan
- Compilation and Communication Strategies for Outofcore programs on Distributed Memory Machines
- COMPILATION AND RUNTIME TECHNIQUES FOR DATAPARALLEL PROGRAMS
- Memory-Constrained Communication Minimization for a Class of Array Computations
- INTEGRATING DISTRIBUTION
- Compiler Support for Optimizing Tensor Contraction Expressions in Quantum Chemistry Computations
- Improving Offset Assignment for Embedded Processors Sunil Atri1
- EE, CompE and CS Programs: Merger or Peaceful Co-Existence?