
- Derivation and Performance of a Pipelined Transaction Processor
- Reproducing inter-process synchronization for performance
- Efficient Interprocedural Data Placement Optimisation in a Parallel Library
- Runtime Interprocedural Data Placement Optimisation for Lazy Parallel Libraries
- A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs
- WORKLOAD CHARACTERIZATION USING LIGHTWEIGHT SYSTEM CALL TRACING AND REEXECUTION
- Efficient sharedmemory support for parallel graph reduction
- Abstract, declarative control over partitioning in parallel functional
- Optimising Shared Reduction Variables in MPI A.J. Field, P.H.J. Kelly and T.L. Hansen
- DYNAMIC INSTRUMENTATION FOR JAVA USING A VIRTUAL JVM
- Cautious, MachineIndependent Performance Tuning for SharedMemory Multiprocessors
- High-Performance SIMT Code Generation in an Active Visual Effects Library
- Instant-access cycle-stealing for parallel applications requiring interactive response
- Reactive Proxies: a Flexible Protocol Extension to Reduce ccNUMA
- Parallel Programming Using Skeleton J. Darlington, A.J. Field, P.G. Harrison,
- Design and Implementation of an ObjectOrientated 64bit Single Address
- A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs
- Eliminating Invalidation in Coherent-Cache Parallel Graph Reduction
- (Statement) Assign Generalised Image Filtering Performance (1 Pass)
- E cient Interprocedural Data Placement Optimisation in a Parallel Library
- Data Distribution at RunTime: ReUsing Execution Plans
- Experiments with Parallelising Numerical Applications via DESOLibraries
- Instantaccess cyclestealing for parallel applications requiring interactive response
- Is Morton layout competitive for large two-dimensional arrays?
- Angel: Resource Unication in a 64bit MicroKernel
- Delayed Evaluation, SelfOptimising Software Components as a Programming Model
- Adaptive Proxies: Handling Widely-Shared Data in Shared-Memory Multiprocessors
- Eliminating Invalidation in CoherentCache Parallel Graph Reduction
- Backwardscompatible bounds checking for arrays and pointers in C Richard W M Jones and Paul H J Kelly
- Efficient FieldSensitive Pointer Analysis for C David J. Pearce
- A Domain-Specific Interpreter for Parallelizing a Large Mixed-Language Visualisation Application
- Experiments with Parallelising Numerical Applications via DESOLibraries
- A Review of Data Placement Optimisation for Data-Parallel Component Composition
- OPTIMISING COMPONENT COMPOSITION USING INDEXED DEPENDENCE METADATA Lee W. Howes, Anton Lokhmotov, Paul H. J. Kelly, A. J. Field
- Deriving Efficient Data Movement From Decoupled Access/Execute Specifications
- Tracing and Reexecuting Operating System Calls for Reproducible Performance Experiments
- Cautious, Machine-Independent Performance Tuning for Shared-Memory Multiprocessors
- E cient shared-memory support for parallel graph reduction
- Run-time code generation in C++ as a foundation for domain-speci c optimisation
- Online Cycle Detection and Difference Propagation for Pointer Analysis David J. Pearce, Paul H.J. Kelly and Chris Hankin
- Explicit Dependence Metadata in an Active Visual Effects Library
- A dynamic algorithm for topologically sorting directed acyclic graphs
- A Review of Data Placement Optimisation for DataParallel Component Composition
- CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2004; 00:16 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02]
- Search strategies for Java bottleneck location by dynamic instrumentation
- An exhaustive evaluation of rowmajor, columnmajor and Morton layouts for large twodimensional arrays
- 1 STABLE PERFORMANCE FOR CCNUMA USING FIRST TOUCH PAGE
- Parallel Processing Letters c World Scientific Publishing Company
- A dynamic algorithm for topologically sorting directed acyclic graphs
- Backwards-compatible bounds checking for arrays
- Professor Paul H. J. Kelly, Professor of Software Technology Inaugural lecture: Over and over again: the discipline of
- Improving the Performance of Morton Layout by Array Alignment and Loop Unrolling
- WORKLOAD CHARACTERIZATION USING LIGHTWEIGHT SYSTEM CALL TRACING AND REEXECUTION
- MEProf: Modular Extensible Profiling for Eclipse Marc Hull, Olav Beckmann, Paul H. J. Kelly
- Backwards-compatible bounds checking for arrays and pointers in C Richard W M Jones and Paul H J Kelly
- Performance prediction of paging workloads using lightweight tracing Ariel N Burton 1
- Optimising Java RMI Programs by Communication Restructuring
- Angel:ResourceUni cation ina64-bitMicro-Kernel
- Generating CUDA code at runtime: specializing accelerator code to runtime data
- Performance prediction of paging workloads using lightweight tracing Ariel N Burton
- Minimizing Associativity Conflicts in Morton Layout Jeyarajan Thiyagalingam1,2, Olav Beckmann1, and Paul H. J. Kelly1
- Profile-directed speculative optimization of reconfigurable floating point data paths
- Towards Metaprogramming for Parallel Systems on a Chip
- Profiling floating point value ranges for reconfigurable
- Derivation and Performance of a Pipelined Transaction Processor
- 0000000000000000 0000000000000000
- DesignandImplementationofan Object-Orientated64-bitSingleAddress
- DESOLA: an Active Linear Algebra Library Using Delayed Evaluation and Runtime Code
- Parallel Programming Using Skeleton J. Darlington, A.J. Field, P.G. Harrison,
- Delayed Evaluation, Self-Optimising Software Components as a Programming Model
- Preface vii 1 Introduction 1
- Reactive Proxies: a Flexible Protocol Extension to Reduce ccNUMA
- Optimising Java RMI Programs by Communication Restructuring
- Runtime Interprocedural Data Placement Optimisation for Lazy Parallel Libraries
- A Declarative Framework for Analysis and Optimization
- An exhaustive evaluation of row-major, column-major and Morton layouts for large two-dimensional arrays
- Generating Hardware Designs by Source Code Transformation
- Towards generating optimised finite element solvers for GPUs from high-level specifications
- Inference of session types from control flow Peter Collingbourne1
- Improving GNU Compiler Collection Infrastructure for Streamization Antoniu Pop
- A Dynamic Topological Sort Algorithm for Directed Acyclic Graphs
- Accelerating a C++ Image Processing Library with a GPU
- Efficient Field-Sensitive Pointer Analysis for C David J. Pearce
- Online Cycle Detection and Difference Propagation for Pointer Analysis David J. Pearce, Paul H.J. Kelly and Chris Hankin
- 1 STABLE PERFORMANCE FOR CC-NUMA USING FIRST TOUCH PAGE
- Data Distribution at Run-Time: Re-Using Execution Plans
- M-Tree: A Parallel Abstract Data Type for Block-Irregular Adaptive Applications
- SoftwarePerformance OptimisationGroup
- SoftwarePerformance OptimisationGroup
- SoftwarePerformanceOptimisationGroup ImperialCollege,London
- Abstract, declarative control over partitioning in parallel functional
- Software abstractions for many-core software engineering