Cache Locality Optimization for Recursive Programs

Lifflander, Jonathan; Krishnamoorthy, Sriram

doi:10.1145/3140587.3062385

Cache Locality Optimization for Recursive Programs

Conference · Wed Jun 14 04:00:00 EDT 2017

DOI:https://doi.org/10.1145/3140587.3062385· OSTI ID:1440662

Lifflander, Jonathan; Krishnamoorthy, Sriram

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (US)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1440662

Report Number(s):: PNNL-SA-123961; KJ0402000

Country of Publication:: United States

Language:: English

References (30)

An annotation language for optimizing software libraries Guyer, Samuel Z.; Lin, Calvin Proceedings of the 2nd conference on Domain-specific languages - PLAN '99 https://doi.org/10.1145/331960.331970	conference	January 1999
Qthreads: An API for programming with millions of lightweight threads Wheeler, Kyle B.; Murphy, Richard C.; Thain, Douglas Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536359	conference	April 2008
A Java fork/join framework Lea, Doug Proceedings of the ACM 2000 conference on Java Grande - JAVA '00 https://doi.org/10.1145/337449.337465	conference	January 2000
The tasks with effects model for safe concurrency Heumann, Stephen T.; Adve, Vikram S.; Wang, Shengjie Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13 https://doi.org/10.1145/2442516.2442540	conference	January 2013
Automatic parallelization of divide and conquer algorithms Rugina, Radu; Rinard, Martin Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99 https://doi.org/10.1145/301104.301111	conference	January 1999
Enhancing locality for recursive traversals of recursive structures Jo, Youngjoon; Kulkarni, Milind Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11 https://doi.org/10.1145/2048066.2048104	conference	January 2011
Delinearization: an efficient way to break multiloop dependence equations Maslov, Vadim Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92 https://doi.org/10.1145/143095.143130	conference	January 1992
Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing Lifflander, Jonathan; Krishnamoorthy, Sriram; Kale, Laxmikant V. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.75	conference	November 2014
A work-stealing scheduler for X10's task parallelism with suspension Tardieu, Olivier; Wang, Haichuan; Lin, Haibo Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12 https://doi.org/10.1145/2145816.2145850	conference	January 2012
The pochoir stencil compiler Tang, Yuan; Chowdhury, Rezaul Alam; Kuszmaul, Bradley C. Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11 https://doi.org/10.1145/1989493.1989508	conference	January 2011
Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations Bondhugula, Uday; Bandishti, Vinayaka; Pananilath, Irshad IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 5 https://doi.org/10.1109/TPDS.2016.2615094	journal	May 2017
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures Chan, Ernie; Quintana-Orti, Enrique S.; Quintana-Orti, Gregorio Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07 https://doi.org/10.1145/1248377.1248397	conference	January 2007
Concurrent Collections Budimlić, Zoran; Burke, Michael; Cavé, Vincent Scientific Programming, Vol. 18, Issue 3-4 https://doi.org/10.1155/2010/521797	journal	January 2010
Thread scheduling for cache locality Philbin, James; Edler, Jan; Anshus, Otto J. Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII https://doi.org/10.1145/237090.237151	conference	January 1996
The implementation of the Cilk-5 multithreaded language Frigo, Matteo; Leiserson, Charles E.; Randall, Keith H. Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98 https://doi.org/10.1145/277650.277725	conference	January 1998
Language support for dynamic, hierarchical data partitioning Treichler, Sean; Bauer, Michael; Aiken, Alex Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications - OOPSLA '13 https://doi.org/10.1145/2509136.2509545	conference	January 2013
Symbolic bounds analysis of pointers, array indices, and accessed memory regions Rugina, Radu; Rinard, Martin C. ACM Transactions on Programming Languages and Systems, Vol. 27, Issue 2 https://doi.org/10.1145/1057387.1057388	journal	March 2005
Scheduling threads for constructive cache sharing on CMPs Chen, Shimin; Mowry, Todd C.; Wilkerson, Chris Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07 https://doi.org/10.1145/1248377.1248396	conference	January 2007
Composable Parallel Patterns with Intel Cilk Plus Robison, Arch D. Computing in Science & Engineering, Vol. 15, Issue 2 https://doi.org/10.1109/MCSE.2013.21	journal	March 2013
Legion: Expressing locality and independence with logical regions Bauer, Michael; Treichler, Sean; Slaughter, Elliott 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71	conference	November 2012
Programming with exceptions in JCilk Danaher, John S.; Angelina Lee, I. -Ting; Leiserson, Charles E. Science of Computer Programming, Vol. 63, Issue 2 https://doi.org/10.1016/j.scico.2006.05.008	journal	December 2006
First-class user-level threads Marsh, Brian D.; Scott, Michael L.; LeBlanc, Thomas J. Proceedings of the thirteenth ACM symposium on Operating systems principles - SOSP '91 https://doi.org/10.1145/121132.344329	conference	January 1991
Data locality and load balancing in COOL Chandra, Rohit; Gupta, Anoop; Hennessy, John L. Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93 https://doi.org/10.1145/155332.155358	conference	January 1993
A Transformation System for Developing Recursive Programs Burstall, R. M.; Darlington, John Journal of the ACM, Vol. 24, Issue 1 https://doi.org/10.1145/321992.321996	journal	January 1977
Pointer analysis for structured parallel programs Rugina, Radu; Rinard, Martin C. ACM Transactions on Programming Languages and Systems, Vol. 25, Issue 1 https://doi.org/10.1145/596980.596982	journal	January 2003
Design of a separable transition-diagram compiler Conway, Melvin E. Communications of the ACM, Vol. 6, Issue 7 https://doi.org/10.1145/366663.366704	journal	July 1963
Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries Kennedy, Ken; Broom, Bradley; Cooper, Keith Journal of Parallel and Distributed Computing, Vol. 61, Issue 12 https://doi.org/10.1006/jpdc.2001.1724	journal	December 2001
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems Guo, Yi; Zhao, Jisheng; Cave, Vincent Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693504	conference	January 2010
Executing task graphs using work-stealing Agrawal, Kunal; Leiserson, Charles E.; Sukha, Jim 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) https://doi.org/10.1109/IPDPS.2010.5470403	conference	April 2010
A practical automatic polyhedral parallelizer and locality optimizer Bondhugula, Uday; Hartono, Albert; Ramanujam, J. Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08 https://doi.org/10.1145/1375581.1375595	conference	January 2008

Similar Records

Locality Aware Concurrent Start for Stencil Applications

Conference · Mon Feb 09 23:00:00 EST 2015 · OSTI ID:1194299

Constant time worker thread allocation via configuration caching

Patent · Mon Nov 03 23:00:00 EST 2014 · OSTI ID:1163186

Localized Fault Recovery for Nested Fork-Join Programs

Conference · Mon Jul 03 00:00:00 EDT 2017 · OSTI ID:1379446

Cache Locality Optimization for Recursive Programs

Citation Formats

References (30)

Similar Records

Related Subjects