Cache Locality Optimization for Recursive Programs
Abstract
We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.
- Authors:
- Publication Date:
- Research Org.:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1440662
- Report Number(s):
- PNNL-SA-123961
KJ0402000
- DOE Contract Number:
- AC05-76RL01830
- Resource Type:
- Conference
- Resource Relation:
- Conference: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017), June 18-23, 2017, Barcelona, Spain, 1-16
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Lifflander, Jonathan, and Krishnamoorthy, Sriram. Cache Locality Optimization for Recursive Programs. United States: N. p., 2017.
Web. doi:10.1145/3140587.3062385.
Lifflander, Jonathan, & Krishnamoorthy, Sriram. Cache Locality Optimization for Recursive Programs. United States. https://doi.org/10.1145/3140587.3062385
Lifflander, Jonathan, and Krishnamoorthy, Sriram. 2017.
"Cache Locality Optimization for Recursive Programs". United States. https://doi.org/10.1145/3140587.3062385.
@article{osti_1440662,
title = {Cache Locality Optimization for Recursive Programs},
author = {Lifflander, Jonathan and Krishnamoorthy, Sriram},
abstractNote = {We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.},
doi = {10.1145/3140587.3062385},
url = {https://www.osti.gov/biblio/1440662},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jun 14 00:00:00 EDT 2017},
month = {Wed Jun 14 00:00:00 EDT 2017}
}
Works referenced in this record:
Executing task graphs using work-stealing
conference, April 2010
- Agrawal, Kunal; Leiserson, Charles E.; Sukha, Jim
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
A practical automatic polyhedral parallelizer and locality optimizer
conference, January 2008
- Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
- Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08
Programming with exceptions in JCilk
journal, December 2006
- Danaher, John S.; Angelina Lee, I. -Ting; Leiserson, Charles E.
- Science of Computer Programming, Vol. 63, Issue 2
Scheduling threads for constructive cache sharing on CMPs
conference, January 2007
- Chen, Shimin; Mowry, Todd C.; Wilkerson, Chris
- Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07
An annotation language for optimizing software libraries
conference, January 1999
- Guyer, Samuel Z.; Lin, Calvin
- Proceedings of the 2nd conference on Domain-specific languages - PLAN '99
The pochoir stencil compiler
conference, January 2011
- Tang, Yuan; Chowdhury, Rezaul Alam; Kuszmaul, Bradley C.
- Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11
Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
journal, May 2017
- Bondhugula, Uday; Bandishti, Vinayaka; Pananilath, Irshad
- IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 5
A Java fork/join framework
conference, January 2000
- Lea, Doug
- Proceedings of the ACM 2000 conference on Java Grande - JAVA '00
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
conference, January 2010
- Guo, Yi; Zhao, Jisheng; Cave, Vincent
- Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10
Legion: Expressing locality and independence with logical regions
conference, November 2012
- Bauer, Michael; Treichler, Sean; Slaughter, Elliott
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
conference, November 2014
- Lifflander, Jonathan; Krishnamoorthy, Sriram; Kale, Laxmikant V.
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
A work-stealing scheduler for X10's task parallelism with suspension
conference, January 2012
- Tardieu, Olivier; Wang, Haichuan; Lin, Haibo
- Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12
Qthreads: An API for programming with millions of lightweight threads
conference, April 2008
- Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas
- Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
Enhancing locality for recursive traversals of recursive structures
conference, January 2011
- Jo, Youngjoon; Kulkarni, Milind
- Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11
Delinearization: an efficient way to break multiloop dependence equations
conference, January 1992
- Maslov, Vadim
- Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92
Composable Parallel Patterns with Intel Cilk Plus
journal, March 2013
- Robison, Arch D.
- Computing in Science & Engineering, Vol. 15, Issue 2
Data locality and load balancing in COOL
conference, January 1993
- Chandra, Rohit; Gupta, Anoop; Hennessy, John L.
- Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93
Design of a separable transition-diagram compiler
journal, July 1963
- Conway, Melvin E.
- Communications of the ACM, Vol. 6, Issue 7
Thread scheduling for cache locality
conference, January 1996
- Philbin, James; Edler, Jan; Anshus, Otto J.
- Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII
First-class user-level threads
conference, January 1991
- Marsh, Brian D.; Scott, Michael L.; LeBlanc, Thomas J.
- Proceedings of the thirteenth ACM symposium on Operating systems principles - SOSP '91
The implementation of the Cilk-5 multithreaded language
conference, January 1998
- Frigo, Matteo; Leiserson, Charles E.; Randall, Keith H.
- Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98
The tasks with effects model for safe concurrency
conference, January 2013
- Heumann, Stephen T.; Adve, Vikram S.; Wang, Shengjie
- Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
A Transformation System for Developing Recursive Programs
journal, January 1977
- Burstall, R. M.; Darlington, John
- Journal of the ACM, Vol. 24, Issue 1
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
journal, March 2005
- Rugina, Radu; Rinard, Martin C.
- ACM Transactions on Programming Languages and Systems, Vol. 27, Issue 2
Pointer analysis for structured parallel programs
journal, January 2003
- Rugina, Radu; Rinard, Martin C.
- ACM Transactions on Programming Languages and Systems, Vol. 25, Issue 1
Language support for dynamic, hierarchical data partitioning
conference, January 2013
- Treichler, Sean; Bauer, Michael; Aiken, Alex
- Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications - OOPSLA '13
Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries
journal, December 2001
- Kennedy, Ken; Broom, Bradley; Cooper, Keith
- Journal of Parallel and Distributed Computing, Vol. 61, Issue 12
Concurrent Collections
journal, January 2010
- Budimlić, Zoran; Burke, Michael; Cavé, Vincent
- Scientific Programming, Vol. 18, Issue 3-4
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
conference, January 2007
- Chan, Ernie; Quintana-Orti, Enrique S.; Quintana-Orti, Gregorio
- Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07
Automatic parallelization of divide and conquer algorithms
conference, January 1999
- Rugina, Radu; Rinard, Martin
- Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99