Cache Locality Optimization for Recursive Programs
We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1440662
- Report Number(s):
- PNNL-SA-123961; KJ0402000
- Resource Relation:
- Conference: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017), June 18-23, 2017, Barcelona, Spain, 1-16
- Country of Publication:
- United States
- Language:
- English
Executing task graphs using work-stealing
|
conference | April 2010 |
A practical automatic polyhedral parallelizer and locality optimizer
|
conference | January 2008 |
Programming with exceptions in JCilk
|
journal | December 2006 |
Scheduling threads for constructive cache sharing on CMPs
|
conference | January 2007 |
An annotation language for optimizing software libraries
|
conference | January 1999 |
The pochoir stencil compiler
|
conference | January 2011 |
Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
|
journal | May 2017 |
A Java fork/join framework
|
conference | January 2000 |
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
|
conference | January 2010 |
Legion: Expressing locality and independence with logical regions
|
conference | November 2012 |
Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
|
conference | November 2014 |
A work-stealing scheduler for X10's task parallelism with suspension
|
conference | January 2012 |
Qthreads: An API for programming with millions of lightweight threads
|
conference | April 2008 |
Enhancing locality for recursive traversals of recursive structures
|
conference | January 2011 |
Delinearization: an efficient way to break multiloop dependence equations
|
conference | January 1992 |
Composable Parallel Patterns with Intel Cilk Plus
|
journal | March 2013 |
Data locality and load balancing in COOL
|
conference | January 1993 |
Design of a separable transition-diagram compiler
|
journal | July 1963 |
Thread scheduling for cache locality
|
conference | January 1996 |
First-class user-level threads
|
conference | January 1991 |
The implementation of the Cilk-5 multithreaded language
|
conference | January 1998 |
The tasks with effects model for safe concurrency
|
conference | January 2013 |
A Transformation System for Developing Recursive Programs
|
journal | January 1977 |
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
|
journal | March 2005 |
Pointer analysis for structured parallel programs
|
journal | January 2003 |
Language support for dynamic, hierarchical data partitioning
|
conference | January 2013 |
Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries
|
journal | December 2001 |
Concurrent Collections
|
journal | January 2010 |
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
|
conference | January 2007 |
Automatic parallelization of divide and conquer algorithms
|
conference | January 1999 |
Similar Records
Locality Aware Concurrent Start for Stencil Applications
Localized Fault Recovery for Nested Fork-Join Programs