Summary: Fast Parallel GPU-Sorting Using a Hybrid
Erik Sintorn, Ulf Assarsson
Department of Computer Science and Engineering
Chalmers University Of Technology
This paper presents an algorithm for fast sorting of large lists using modern GPUs.
The method achieves high speed by efficiently utilizing the parallelism of the GPU
throughout the whole algorithm. Initially, GPU-based bucketsort or quicksort splits
the list into enough sublists then to be sorted in parallel using merge-sort. The
algorithm is of complexity n log n, and for lists of 8M elements and using a single
Geforce 8800GTS-512, it is 2.5 times as fast as the bitonic sort algorithms, with
standard complexity of n(log n)2, which for long was considered to be the fastest
for GPU sorting. It is 6 times faster than single CPU quicksort, and 10% faster
than the recent GPU-based radix sort. Finally, the algorithm is further parallelized
to utilize two graphics cards, resulting in yet another 1.8 times speedup.
Key words: parallelism, sorting, GPU-algorithms
PACS: 07.05.Hd, 07.05.Kf