| | |
Summary: Decoupled Software Pipelining with the
Synchronization Array
Ram Rangan Neil Vachharajani Manish Vachharajani David I. August
Department of Computer Science
Princeton University
{ram, nvachhar, manishv, august}@cs.princeton.edu
Abstract
Despite the success of instruction-level parallelism (ILP)
optimizations in increasing the performance of micropro-
cessors, certain codes remain elusive. In particular, codes
containing recursive data structure (RDS) traversal loops
have been largely immune to ILP optimizations, due to
the fundamental serialization and variable latency of the
loop-carried dependence through a pointer-chasing load.
To address these and other situations, we introduce decou-
pled software pipelining (DSWP), a technique that stati-
cally splits a single-threaded sequential loop into multi-
ple non-speculative threads, each of which performs use-
ful computation essential for overall program correctness.
The resulting threads execute on thread-parallel architec-
|