| | |
Summary: Using SIMD Registers and Instructions to Enable
Instruction-Level Parallelism in Sorting Algorithms
Timothy Furtak
furtak@cs.ualberta.ca
Jos´e Nelson Amaral
amaral@cs.ualberta.ca
Robert Niewiadomski
niewiado@cs.ualberta.ca
Department of Computing Science
University of Alberta, Edmonton, AB, Canada
ABSTRACT
Most contemporary processors offer some version of Single
Instruction Multiple Data (SIMD) machinery -- vector reg-
isters and instructions to manipulate data stored in such
registers. The central idea of this paper is to use these
SIMD resources to improve the performance of the tail of
recursive sorting algorithms. When the number of elements
to be sorted reaches a set threshold, data is loaded into
the vector registers, manipulated in-register, and the result
stored back to memory. Three implementations of sorting
|