Technique for grouping instructions into independent strands
Abstract
A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.
- Inventors:
- Issue Date:
- Research Org.:
- NVIDIA Corp., Santa Clara, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1531988
- Patent Number(s):
- 9645802
- Application Number:
- 13/961,097
- Assignee:
- NVIDIA Corporation (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B599861; HR0011-13-3-0001
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2013-08-07
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Mehrara, Mojtaba, Garland, Michael, and Diamos, Gregory. Technique for grouping instructions into independent strands. United States: N. p., 2017.
Web.
Mehrara, Mojtaba, Garland, Michael, & Diamos, Gregory. Technique for grouping instructions into independent strands. United States.
Mehrara, Mojtaba, Garland, Michael, and Diamos, Gregory. Tue .
"Technique for grouping instructions into independent strands". United States. https://www.osti.gov/servlets/purl/1531988.
@article{osti_1531988,
title = {Technique for grouping instructions into independent strands},
author = {Mehrara, Mojtaba and Garland, Michael and Diamos, Gregory},
abstractNote = {A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {5}
}
Works referenced in this record:
Apparatus and method for speculatively executing instructions in a computer system
patent, May 1995
- McKeen, Francis X.; Adler, Michael C.; Emer, Joel S.
- US Patent Document 5,421,022
Multithreaded data processing method with long latency subinstructions
patent, April 2001
- Hwang, Myeong-Eun
- US Patent Document 6,216,220
Local stall control method and structure in a microprocessor
patent, August 2001
- Tremblay, Marc; Yeluri, Sharada
- US Patent Document 6,279,100
Generation of compiler description from architecture description
patent, March 2014
- Braun, Gunnar; Hoffmann, Andreas; Greive, Volker
- US Patent Document 8,677,312
Scheduling of instructions
patent, April 2014
- Braun, Gunnar; Hoffmann, Andreas; Grieve, Volker
- US Patent Document 8,689,202
Value speculation on an assist processor to facilitate prefetching for a primary processor
patent-application, December 2001
- Chaudhry, Shailender; Tremblay, Marc
- US Patent Application 09/761360; 20010052064
Supporting out-of-order issue in an execute-ahead processor
patent-application, August 2007
- Chaudhry, Shailender; Trembly, Marc; Capriolo, Paul
- US Patent Application 11/367814; 20070186081
Diagnostic apparatus and method
patent-application, June 2008
- Reid, Alastair David; Ford, Simon Andrew; Kneebone, Katherine Elizabeth
- US Patent Application 11/907112; 20080133897
Credit-Based Streaming Multiprocessor Warp Scheduling
patent-application, March 2011
- Lindholm, John Erik; Coon, Brett W.; Wiezbicki, Jered
- US Patent Application 12/885299; 20110072244
Opcode Counting for Performance Measurement
patent-application, July 2011
- Gara, Alan; Satterfield, David L.; Walkup, Robert E.
- US Patent Application 12/688773; 20110172969
Works referencing / citing this record:
Data processing graph compilation
patent, July 2018
- Stanfill, Craig W.; Shapiro, Richard
- US Patent Document 10,037,198
System and method for managing static divergence in a SIMD computing architecture
patent, March 2018
- Lo, Chen-Kang; Liao, Shih-wei; Han, Cheng-Ting
- US Patent Document 9,921,838