Technique for grouping instructions into independent strands
A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.
- Research Organization:
- NVIDIA Corp., Santa Clara, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B599861; HR0011-13-3-0001
- Assignee:
- NVIDIA Corporation (Santa Clara, CA)
- Patent Number(s):
- 9,645,802
- Application Number:
- 13/961,097
- OSTI ID:
- 1531988
- Resource Relation:
- Patent File Date: 2013-08-07
- Country of Publication:
- United States
- Language:
- English
Data processing graph compilation
|
patent | July 2018 |
System and method for managing static divergence in a SIMD computing architecture
|
patent | March 2018 |
Similar Records
Two fundamental issues in multiprocessing. Technical report
Single-pass parallel prefix scan with dynamic look back