Single-pass parallel prefix scan with dynamic look back
Abstract
One embodiment of the present invention performs a parallel prefix scan in a single pass that incorporates variable look-back. A parallel processing unit (PPU) subdivides a list of inputs into sequentially-ordered segments and assigns each segment to a streaming multiprocessor (SM) included in the PPU. Notably, the SMs may operate in parallel. Each SM executes write operations on a segment descriptor that includes the status, aggregate, and inclusive-prefix associated with the assigned segment. Further, each SM may execute read operations on segment descriptors associated with other segments. In operation, each SM may perform reduction operations to determine a segment-wide aggregate, may perform look-back operations across multiple preceding segments to determine an exclusive-prefix, and may perform a scan seeded with the exclusive prefix to generate output data. Advantageously, the PPU performs one read operation per input, thereby reducing the time required to execute the prefix scan relative to prior-art parallel implementations.
- Inventors:
- Issue Date:
- Research Org.:
- NVIDIA Corp., Santa Clara, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1532070
- Patent Number(s):
- 9928033
- Application Number:
- 14/043,626
- Assignee:
- NVIDIA Corporation (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B599861; HR0011-13-3-0001
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2013-10-01
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Merrill, Duane. Single-pass parallel prefix scan with dynamic look back. United States: N. p., 2018.
Web.
Merrill, Duane. Single-pass parallel prefix scan with dynamic look back. United States.
Merrill, Duane. Tue .
"Single-pass parallel prefix scan with dynamic look back". United States. https://www.osti.gov/servlets/purl/1532070.
@article{osti_1532070,
title = {Single-pass parallel prefix scan with dynamic look back},
author = {Merrill, Duane},
abstractNote = {One embodiment of the present invention performs a parallel prefix scan in a single pass that incorporates variable look-back. A parallel processing unit (PPU) subdivides a list of inputs into sequentially-ordered segments and assigns each segment to a streaming multiprocessor (SM) included in the PPU. Notably, the SMs may operate in parallel. Each SM executes write operations on a segment descriptor that includes the status, aggregate, and inclusive-prefix associated with the assigned segment. Further, each SM may execute read operations on segment descriptors associated with other segments. In operation, each SM may perform reduction operations to determine a segment-wide aggregate, may perform look-back operations across multiple preceding segments to determine an exclusive-prefix, and may perform a scan seeded with the exclusive prefix to generate output data. Advantageously, the PPU performs one read operation per input, thereby reducing the time required to execute the prefix scan relative to prior-art parallel implementations.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Mar 27 00:00:00 EDT 2018},
month = {Tue Mar 27 00:00:00 EDT 2018}
}
Works referenced in this record:
Global-view abstractions for user-defined reductions and scans
conference, January 2006
- Deitz, Steven J.; Callahan, David; Chamberlain, Bradford L.
- Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06