skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Graphics processor unit with opportunistic inter-path reconvergence

Patent ·
OSTI ID:2222121

A graphics processing unit and methods for comping and executing instructions with opportunistic inter-path reconvergence are provided. A graphics processing unit may access computer executable instructions mapped to code blocks of a control flow for a warp. The code blocks may include an immediate dominator block and an intermediate post dominator block. The graphics processing unit may store a first thread mask associated with the first code block. The first thread mask may include a plurality of bits indicative of the active or non-active status for the threads of the warp, respectively. The graphics processing unit may a second thread mask corresponding to an intermediate code block between the immediate dominator block and intermediate post dominator block. The graphics processing unit may execute, with threads indicated as active by the first thread mask, instructions of the intermediate code block with a first operand or a second operand depending on the second thread mask.

Research Organization:
Purdue Univ., West Lafayette, IN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
SC0010295
Assignee:
Purdue Research Foundation (West Lafayette, IN)
Patent Number(s):
11,726,785
Application Number:
17/491,057
OSTI ID:
2222121
Resource Relation:
Patent File Date: 09/30/2021
Country of Publication:
United States
Language:
English

References (19)

Cache-Conscious Wavefront Scheduling conference December 2012
Simultaneous branch and warp interweaving for sustained GPU performance journal September 2012
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow conference January 2007
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation journal June 2013
Clock rate versus IPC
  • Agarwal, Vikas; Hrishikesh, M. S.; Keckler, Stephen W.
  • Proceedings of the 27th annual international symposium on Computer architecture - ISCA '00 https://doi.org/10.1145/339647.339691
conference January 2000
GPUs and the Future of Parallel Computing journal September 2011
Approximating warps with intra-warp operand value similarity conference March 2016
SIMD divergence optimization through intra-warp compaction conference June 2013
Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations conference January 2015
Dynamic warp subdivision for integrated branch and memory divergence tolerance journal June 2010
A variable warp size architecture journal June 2015
Analyzing CUDA workloads using a detailed GPU simulator conference April 2009
Design of ion-implanted MOSFET's with very small physical dimensions journal October 1974
Convergence and scalarization for data-parallel architectures conference February 2013
Rodinia: A benchmark suite for heterogeneous computing conference October 2009
Thread block compaction for efficient SIMT control flow conference February 2011
A scalable multi-path microarchitecture for efficient GPU control flow conference February 2014
Improving GPU performance via large warps and two-level warp scheduling conference December 2011
Apparatus and method for adaptable and efficient lane-wise tensor processing patent September 2020

Similar Records

Developing And Scaling an OpenFOAM Model to Study Turbulent Flow in a HFIR Coolant Channel
Technical Report · Fri Mar 01 00:00:00 EST 2024 · OSTI ID:2222121

CUDA Computation of the Feynman Distribution
Journal Article · Sat Jul 01 00:00:00 EDT 2017 · Transactions of the American Nuclear Society · OSTI ID:2222121

Efficient Scheduling of Recursive Control Flow on GPUs
Conference · Mon Jun 10 00:00:00 EDT 2013 · OSTI ID:2222121

Related Subjects