skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Cache Locality Optimization for Recursive Programs

Abstract

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.

Authors:
;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1440662
Report Number(s):
PNNL-SA-123961
KJ0402000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017), June 18-23, 2017, Barcelona, Spain, 1-16
Country of Publication:
United States
Language:
English

Citation Formats

Lifflander, Jonathan, and Krishnamoorthy, Sriram. Cache Locality Optimization for Recursive Programs. United States: N. p., 2017. Web. doi:10.1145/3140587.3062385.
Lifflander, Jonathan, & Krishnamoorthy, Sriram. Cache Locality Optimization for Recursive Programs. United States. https://doi.org/10.1145/3140587.3062385
Lifflander, Jonathan, and Krishnamoorthy, Sriram. 2017. "Cache Locality Optimization for Recursive Programs". United States. https://doi.org/10.1145/3140587.3062385.
@article{osti_1440662,
title = {Cache Locality Optimization for Recursive Programs},
author = {Lifflander, Jonathan and Krishnamoorthy, Sriram},
abstractNote = {We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.},
doi = {10.1145/3140587.3062385},
url = {https://www.osti.gov/biblio/1440662}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jun 14 00:00:00 EDT 2017},
month = {Wed Jun 14 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Executing task graphs using work-stealing
conference, April 2010


A practical automatic polyhedral parallelizer and locality optimizer
conference, January 2008

  • Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
  • Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08
  • https://doi.org/10.1145/1375581.1375595

Programming with exceptions in JCilk
journal, December 2006


Scheduling threads for constructive cache sharing on CMPs
conference, January 2007


An annotation language for optimizing software libraries
conference, January 1999


The pochoir stencil compiler
conference, January 2011


Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
journal, May 2017


A Java fork/join framework
conference, January 2000


SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
conference, January 2010


Legion: Expressing locality and independence with logical regions
conference, November 2012

  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2012.71

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
conference, November 2014

  • Lifflander, Jonathan; Krishnamoorthy, Sriram; Kale, Laxmikant V.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2014.75

A work-stealing scheduler for X10's task parallelism with suspension
conference, January 2012

  • Tardieu, Olivier; Wang, Haichuan; Lin, Haibo
  • Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12
  • https://doi.org/10.1145/2145816.2145850

Qthreads: An API for programming with millions of lightweight threads
conference, April 2008

  • Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing
  • https://doi.org/10.1109/IPDPS.2008.4536359

Enhancing locality for recursive traversals of recursive structures
conference, January 2011

  • Jo, Youngjoon; Kulkarni, Milind
  • Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11
  • https://doi.org/10.1145/2048066.2048104

Delinearization: an efficient way to break multiloop dependence equations
conference, January 1992


Composable Parallel Patterns with Intel Cilk Plus
journal, March 2013


Data locality and load balancing in COOL
conference, January 1993

  • Chandra, Rohit; Gupta, Anoop; Hennessy, John L.
  • Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93
  • https://doi.org/10.1145/155332.155358

Design of a separable transition-diagram compiler
journal, July 1963


Thread scheduling for cache locality
conference, January 1996

  • Philbin, James; Edler, Jan; Anshus, Otto J.
  • Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII
  • https://doi.org/10.1145/237090.237151

First-class user-level threads
conference, January 1991


The implementation of the Cilk-5 multithreaded language
conference, January 1998

  • Frigo, Matteo; Leiserson, Charles E.; Randall, Keith H.
  • Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98
  • https://doi.org/10.1145/277650.277725

The tasks with effects model for safe concurrency
conference, January 2013

  • Heumann, Stephen T.; Adve, Vikram S.; Wang, Shengjie
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
  • https://doi.org/10.1145/2442516.2442540

A Transformation System for Developing Recursive Programs
journal, January 1977


Symbolic bounds analysis of pointers, array indices, and accessed memory regions
journal, March 2005


Pointer analysis for structured parallel programs
journal, January 2003


Language support for dynamic, hierarchical data partitioning
conference, January 2013

  • Treichler, Sean; Bauer, Michael; Aiken, Alex
  • Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications - OOPSLA '13
  • https://doi.org/10.1145/2509136.2509545

Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries
journal, December 2001


Concurrent Collections
journal, January 2010


Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
conference, January 2007

  • Chan, Ernie; Quintana-Orti, Enrique S.; Quintana-Orti, Gregorio
  • Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07
  • https://doi.org/10.1145/1248377.1248397

Automatic parallelization of divide and conquer algorithms
conference, January 1999