skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Locality Aware Concurrent Start for Stencil Applications

Abstract

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodes with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the givenmore » applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1194299
Report Number(s):
PNNL-SA-108612
KJ0402000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2015), February 7-11, 2015, San Francisco, California, 157-166
Country of Publication:
United States
Language:
English
Subject:
Locality Aware execution, jagged tiling; poyhedral framework

Citation Formats

Shrestha, Sunil, Gao, Guang R., Manzano Franco, Joseph B., Marquez, Andres, and Feo, John T. Locality Aware Concurrent Start for Stencil Applications. United States: N. p., 2015. Web. doi:10.1109/CGO.2015.7054196.
Shrestha, Sunil, Gao, Guang R., Manzano Franco, Joseph B., Marquez, Andres, & Feo, John T. Locality Aware Concurrent Start for Stencil Applications. United States. doi:10.1109/CGO.2015.7054196.
Shrestha, Sunil, Gao, Guang R., Manzano Franco, Joseph B., Marquez, Andres, and Feo, John T. Tue . "Locality Aware Concurrent Start for Stencil Applications". United States. doi:10.1109/CGO.2015.7054196.
@article{osti_1194299,
title = {Locality Aware Concurrent Start for Stencil Applications},
author = {Shrestha, Sunil and Gao, Guang R. and Manzano Franco, Joseph B. and Marquez, Andres and Feo, John T.},
abstractNote = {Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodes with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.},
doi = {10.1109/CGO.2015.7054196},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Feb 10 00:00:00 EST 2015},
month = {Tue Feb 10 00:00:00 EST 2015}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: