skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Cache Locality Optimization for Recursive Programs

Conference ·

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1440662
Report Number(s):
PNNL-SA-123961; KJ0402000
Resource Relation:
Conference: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017), June 18-23, 2017, Barcelona, Spain, 1-16
Country of Publication:
United States
Language:
English

References (30)

Executing task graphs using work-stealing conference April 2010
A practical automatic polyhedral parallelizer and locality optimizer
  • Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
  • Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08 https://doi.org/10.1145/1375581.1375595
conference January 2008
Programming with exceptions in JCilk journal December 2006
Scheduling threads for constructive cache sharing on CMPs conference January 2007
An annotation language for optimizing software libraries conference January 1999
The pochoir stencil compiler conference January 2011
Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations journal May 2017
A Java fork/join framework conference January 2000
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems conference January 2010
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
  • Lifflander, Jonathan; Krishnamoorthy, Sriram; Kale, Laxmikant V.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.75
conference November 2014
A work-stealing scheduler for X10's task parallelism with suspension
  • Tardieu, Olivier; Wang, Haichuan; Lin, Haibo
  • Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12 https://doi.org/10.1145/2145816.2145850
conference January 2012
Qthreads: An API for programming with millions of lightweight threads
  • Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536359
conference April 2008
Enhancing locality for recursive traversals of recursive structures
  • Jo, Youngjoon; Kulkarni, Milind
  • Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11 https://doi.org/10.1145/2048066.2048104
conference January 2011
Delinearization: an efficient way to break multiloop dependence equations conference January 1992
Composable Parallel Patterns with Intel Cilk Plus journal March 2013
Data locality and load balancing in COOL
  • Chandra, Rohit; Gupta, Anoop; Hennessy, John L.
  • Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93 https://doi.org/10.1145/155332.155358
conference January 1993
Design of a separable transition-diagram compiler journal July 1963
Thread scheduling for cache locality
  • Philbin, James; Edler, Jan; Anshus, Otto J.
  • Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII https://doi.org/10.1145/237090.237151
conference January 1996
First-class user-level threads conference January 1991
The implementation of the Cilk-5 multithreaded language
  • Frigo, Matteo; Leiserson, Charles E.; Randall, Keith H.
  • Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98 https://doi.org/10.1145/277650.277725
conference January 1998
The tasks with effects model for safe concurrency
  • Heumann, Stephen T.; Adve, Vikram S.; Wang, Shengjie
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13 https://doi.org/10.1145/2442516.2442540
conference January 2013
A Transformation System for Developing Recursive Programs journal January 1977
Symbolic bounds analysis of pointers, array indices, and accessed memory regions journal March 2005
Pointer analysis for structured parallel programs journal January 2003
Language support for dynamic, hierarchical data partitioning
  • Treichler, Sean; Bauer, Michael; Aiken, Alex
  • Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications - OOPSLA '13 https://doi.org/10.1145/2509136.2509545
conference January 2013
Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries journal December 2001
Concurrent Collections journal January 2010
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
  • Chan, Ernie; Quintana-Orti, Enrique S.; Quintana-Orti, Gregorio
  • Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07 https://doi.org/10.1145/1248377.1248397
conference January 2007
Automatic parallelization of divide and conquer algorithms conference January 1999

Similar Records

Related Subjects