Graphics processor unit with opportunistic inter-path reconvergence

Kulkarni, Milind; Hbeika, Jad

Title: Graphics processor unit with opportunistic inter-path reconvergence

Patent · Tue Aug 15 00:00:00 EDT 2023

OSTI ID:2222121

Kulkarni, Milind; Hbeika, Jad

A graphics processing unit and methods for comping and executing instructions with opportunistic inter-path reconvergence are provided. A graphics processing unit may access computer executable instructions mapped to code blocks of a control flow for a warp. The code blocks may include an immediate dominator block and an intermediate post dominator block. The graphics processing unit may store a first thread mask associated with the first code block. The first thread mask may include a plurality of bits indicative of the active or non-active status for the threads of the warp, respectively. The graphics processing unit may a second thread mask corresponding to an intermediate code block between the immediate dominator block and intermediate post dominator block. The graphics processing unit may execute, with threads indicated as active by the first thread mask, instructions of the intermediate code block with a first operand or a second operand depending on the second thread mask.

View Patent

Cite

Export

Save

Research Organization:: Purdue Univ., West Lafayette, IN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: SC0010295

Assignee:: Purdue Research Foundation (West Lafayette, IN)

Patent Number(s):: 11,726,785

Application Number:: 17/491,057

OSTI ID:: 2222121

Resource Relation:: Patent File Date: 09/30/2021

Country of Publication:: United States

Language:: English

References (19)

Cache-Conscious Wavefront Scheduling Rogers, Timothy G.; O'Connor, Mike; Aamodt, Tor M. 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture https://doi.org/10.1109/MICRO.2012.16	conference	December 2012
Simultaneous branch and warp interweaving for sustained GPU performance Brunie, Nicolas; Collange, Caroline; Diamos, Gregory ACM SIGARCH Computer Architecture News, Vol. 40, Issue 3 https://doi.org/10.1145/2366231.2337166	journal	September 2012
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow Fung, Wilson W. L.; Sham, Ivan; Yuan, George 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007) https://doi.org/10.1109/MICRO.2007.30	conference	January 2007
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation Rhu, Minsoo; Erez, Mattan ACM SIGARCH Computer Architecture News, Vol. 41, Issue 3 https://doi.org/10.1145/2508148.2485953	journal	June 2013
Clock rate versus IPC Agarwal, Vikas; Hrishikesh, M. S.; Keckler, Stephen W. Proceedings of the 27th annual international symposium on Computer architecture - ISCA '00 https://doi.org/10.1145/339647.339691	conference	January 2000
GPUs and the Future of Parallel Computing Keckler, Stephen W.; Dally, William J.; Khailany, Brucek IEEE Micro, Vol. 31, Issue 5 https://doi.org/10.1109/MM.2011.89	journal	September 2011
Approximating warps with intra-warp operand value similarity Wong, Daniel; Kim, Nam Sung; Annavaram, Murali 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2016.7446063	conference	March 2016
SIMD divergence optimization through intra-warp compaction Vaidya, Aniruddha S.; Shayesteh, Anahita; Woo, Dong Hyuk Proceedings of the 40th Annual International Symposium on Computer Architecture https://doi.org/10.1145/2485922.2485954	conference	June 2013
Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations Wu, Bo; Chen, Guoyang; Li, Dong Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15 https://doi.org/10.1145/2751205.2751213	conference	January 2015
Dynamic warp subdivision for integrated branch and memory divergence tolerance Meng, Jiayuan; Tarjan, David; Skadron, Kevin ACM SIGARCH Computer Architecture News, Vol. 38, Issue 3 https://doi.org/10.1145/1816038.1815992	journal	June 2010
A variable warp size architecture Rogers, Timothy G.; Johnson, Daniel R.; O'Connor, Mike ACM SIGARCH Computer Architecture News, Vol. 43, Issue 3S https://doi.org/10.1145/2872887.2750410	journal	June 2015
Analyzing CUDA workloads using a detailed GPU simulator Bakhoda, Ali; Yuan, George L.; Fung, Wilson W. L. Software (ISPASS), 2009 IEEE International Symposium on Performance Analysis of Systems and Software https://doi.org/10.1109/ISPASS.2009.4919648	conference	April 2009
Design of ion-implanted MOSFET's with very small physical dimensions Dennard, R. H.; Gaensslen, F. H.; Rideout, V. L. IEEE Journal of Solid-State Circuits, Vol. 9, Issue 5 https://doi.org/10.1109/JSSC.1974.1050511	journal	October 1974
Convergence and scalarization for data-parallel architectures Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) https://doi.org/10.1109/CGO.2013.6494995	conference	February 2013
Rodinia: A benchmark suite for heterogeneous computing Che, Shuai; Boyer, Michael; Meng, Jiayuan 2009 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2009.5306797	conference	October 2009
Thread block compaction for efficient SIMT control flow Fung, Wilson W. L.; Aamodt, Tor M. 2011 IEEE 17th International Symposium on High Performance Computer Architecture https://doi.org/10.1109/HPCA.2011.5749714	conference	February 2011
A scalable multi-path microarchitecture for efficient GPU control flow ElTantawy, Ahmed; Ma, Jessica Wenjie; O'Connor, Mike 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2014.6835936	conference	February 2014
Improving GPU performance via large warps and two-level warp scheduling Narasiman, Veynu; Shebanow, Michael; Lee, Chang Joo Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture https://doi.org/10.1145/2155620.2155656	conference	December 2011
Apparatus and method for adaptable and efficient lane-wise tensor processing Pearce, Jonathan; Sheffield, David; Srinivasan, Srikanth https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/10776110 US Patent Document 10,776,110	patent	September 2020

Similar Records

Developing And Scaling an OpenFOAM Model to Study Turbulent Flow in a HFIR Coolant Channel

Technical Report · Fri Mar 01 00:00:00 EST 2024 · OSTI ID:2222121

Popov, Emilian; Mecham, Nicholas; Edwardson, Carter

CUDA Computation of the Feynman Distribution

Journal Article · Sat Jul 01 00:00:00 EDT 2017 · Transactions of the American Nuclear Society · OSTI ID:2222121

Talamo, A.; Gohar, Y.

Efficient Scheduling of Recursive Control Flow on GPUs

Conference · Mon Jun 10 00:00:00 EDT 2013 · OSTI ID:2222121

Huo, Xin; Krishnamoorthy, Sriram; Agrawal, Gagan

Title: Graphics processor unit with opportunistic inter-path reconvergence

Citation Formats

References (19)

Similar Records

Related Subjects