 
Summary: Fast Parallel GPUSorting Using a Hybrid
Algorithm
Erik Sintorn, Ulf Assarsson
Department of Computer Science and Engineering
Chalmers University Of Technology
Gothenburg, Sweden
Abstract
This paper presents an algorithm for fast sorting of large lists using modern GPUs.
The method achieves high speed by efficiently utilizing the parallelism of the GPU
throughout the whole algorithm. Initially, GPUbased bucketsort or quicksort splits
the list into enough sublists then to be sorted in parallel using mergesort. The
algorithm is of complexity n log n, and for lists of 8M elements and using a single
Geforce 8800GTS512, it is 2.5 times as fast as the bitonic sort algorithms, with
standard complexity of n(log n)2, which for long was considered to be the fastest
for GPU sorting. It is 6 times faster than single CPU quicksort, and 10% faster
than the recent GPUbased radix sort. Finally, the algorithm is further parallelized
to utilize two graphics cards, resulting in yet another 1.8 times speedup.
Key words: parallelism, sorting, GPUalgorithms
PACS: 07.05.Hd, 07.05.Kf
1 Introduction
