Single-pass parallel prefix scan with dynamic look back
One embodiment of the present invention performs a parallel prefix scan in a single pass that incorporates variable look-back. A parallel processing unit (PPU) subdivides a list of inputs into sequentially-ordered segments and assigns each segment to a streaming multiprocessor (SM) included in the PPU. Notably, the SMs may operate in parallel. Each SM executes write operations on a segment descriptor that includes the status, aggregate, and inclusive-prefix associated with the assigned segment. Further, each SM may execute read operations on segment descriptors associated with other segments. In operation, each SM may perform reduction operations to determine a segment-wide aggregate, may perform look-back operations across multiple preceding segments to determine an exclusive-prefix, and may perform a scan seeded with the exclusive prefix to generate output data. Advantageously, the PPU performs one read operation per input, thereby reducing the time required to execute the prefix scan relative to prior-art parallel implementations.
- Research Organization:
- NVIDIA Corp., Santa Clara, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B599861; HR0011-13-3-0001
- Assignee:
- NVIDIA Corporation (Santa Clara, CA)
- Patent Number(s):
- 9,928,033
- Application Number:
- 14/043,626
- OSTI ID:
- 1532070
- Resource Relation:
- Patent File Date: 2013-10-01
- Country of Publication:
- United States
- Language:
- English
Global-view abstractions for user-defined reductions and scans
|
conference | January 2006 |
Similar Records
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Center for Technology for Advanced Scientific Componet Software (TASCS)