| | |
Summary: Modeling Parallel Sorts with LogP on the CM5
Andrea Carol Dusseau
Department of Electrical Engineering and Computer Science
Computer Science Division
University of California, Berkeley
Technical Report: UCB//CSD94829
Abstract
In this paper, the LogP model is used to analyze four parallel sorting algorithms (bitonic, column, radix,
and sample sort). LogP characterizes the performance of modern parallel machines with a small set of
parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors
(P ). We develop implementations of these algorithms in SplitC, a parallel extension to C, and compare
the performance predicted by LogP to actual performance on a CM5 of 32 to 512 processors for a range
of problem sizes and input sets. The sensitivity of the algorithms is evaluated by varying the distribution of
key values and the rank ordering of the input.
The LogP model is shown to be a valuable guide in the development of parallel algorithms and a good
predictor of implementation performance. The model encourages the use of data layouts which minimize
communication and balanced communication schedules which avoid contention. Using an empirical model
of local processor performance, LogP predictions closely match observed execution times on uniformly
distributed keys across a broad range of problem and machine sizes for all four algorithms. Communication
|