| | |
Summary: 1
Fast Parallel Sorting under
LogP: from theory to practice
DAVID E. CULLER, ANDREA C. DUSSEAU, RICHARD P. MARTIN,
KLAUS ERIK SCHAUSER
1.1 ABSTRACT
The LogP model characterizes the performance of modern parallel machines with a
small set of parameters: the communication latency (L), overhead (o), bandwidth (g),
and the number of processors (P ). In this paper, we analyze four parallel sorting
algorithms (bitonic, column, radix, and sample sort) under LogP. We develop imple
mentations of these algorithms in a parallel extension to C and compare the actual
performance on a CM5 of 32 to 512 processors with that predicted by LogP using
parameter values for this machine. Our experience was that the model served as a
valuable guide throughout the development of the fast parallel sorts and revealed sub
tle defects in the implementations. The final observed performance matches closely
with the prediction across a broad range of problem and machine sizes.
1.2 INTRODUCTION
Fast sorting is important in a wide variety of practical applications, is interesting to
study from a theoretical viewpoint, and offers a wealth of novel parallel solutions. The
richness of this particular problem arises, in part, because it fundamentally requires
|