Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Thread Block Compaction for Efficient SIMT Control Flow Wilson W. L. Fung and Tor M. Aamodt
 

Summary: Thread Block Compaction for Efficient SIMT Control Flow
Wilson W. L. Fung and Tor M. Aamodt
University of British Columbia
wwlfung@ece.ubc.ca, aamodt@ece.ubc.ca
Abstract
Manycore accelerators such as graphics processor units
(GPUs) organize processing units into single-instruction,
multiple data "cores" to improve throughput per unit
hardware cost. Programming models for these acceler-
ators encourage applications to run kernels with large
groups of parallel scalar threads. The hardware groups
these threads into warps/wavefronts and executes them
in lockstep--dubbed single-instruction, multiple-thread
(SIMT) by NVIDIA. While current GPUs employ a per-warp
(or per-wavefront) stack to manage divergent control flow,
it incurs decreased efficiency for applications with nested,
data-dependent control flow. In this paper, we propose and
evaluate the benefits of extending the sharing of resources
in a block of warps, already used for scratchpad mem-
ory, to exploit control flow locality among threads (where

  

Source: Aamodt, Tor - Department of Electrical and Computer Engineering, University of British Columbia

 

Collections: Engineering; Computer Technologies and Information Sciences