Cache Locality Optimization for Recursive Programs
We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1440662
- Report Number(s):
- PNNL-SA-123961; KJ0402000
- Country of Publication:
- United States
- Language:
- English
An annotation language for optimizing software libraries
|
conference | January 1999 |
Qthreads: An API for programming with millions of lightweight threads
|
conference | April 2008 |
A Java fork/join framework
|
conference | January 2000 |
The tasks with effects model for safe concurrency
|
conference | January 2013 |
Automatic parallelization of divide and conquer algorithms
|
conference | January 1999 |
Enhancing locality for recursive traversals of recursive structures
|
conference | January 2011 |
Delinearization: an efficient way to break multiloop dependence equations
|
conference | January 1992 |
Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
|
conference | November 2014 |
A work-stealing scheduler for X10's task parallelism with suspension
|
conference | January 2012 |
The pochoir stencil compiler
|
conference | January 2011 |
Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
|
journal | May 2017 |
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
|
conference | January 2007 |
Concurrent Collections
|
journal | January 2010 |
Thread scheduling for cache locality
|
conference | January 1996 |
The implementation of the Cilk-5 multithreaded language
|
conference | January 1998 |
Language support for dynamic, hierarchical data partitioning
|
conference | January 2013 |
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
|
journal | March 2005 |
Scheduling threads for constructive cache sharing on CMPs
|
conference | January 2007 |
Composable Parallel Patterns with Intel Cilk Plus
|
journal | March 2013 |
Legion: Expressing locality and independence with logical regions
|
conference | November 2012 |
Programming with exceptions in JCilk
|
journal | December 2006 |
First-class user-level threads
|
conference | January 1991 |
Data locality and load balancing in COOL
|
conference | January 1993 |
A Transformation System for Developing Recursive Programs
|
journal | January 1977 |
Pointer analysis for structured parallel programs
|
journal | January 2003 |
Design of a separable transition-diagram compiler
|
journal | July 1963 |
Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries
|
journal | December 2001 |
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
|
conference | January 2010 |
Executing task graphs using work-stealing
|
conference | April 2010 |
A practical automatic polyhedral parallelizer and locality optimizer
|
conference | January 2008 |
Similar Records
Constant time worker thread allocation via configuration caching
Localized Fault Recovery for Nested Fork-Join Programs