| | |
Summary: Computation-Communication Overlap on
Network-of-Workstation Multiprocessors
Gary Liu and Tarek S. Abdelrahman
Department of Electrical and Computer Engineering
The University of Toronto
Toronto, Ontario, Canada M5S 3G4
Abstract
This paper describes and evaluates a compiler trans-
formation that improves the performance of parallel
programs on Network-of-Workstation (NOW) shared-
memory multiprocessors. The transformation overlaps
the communication time resulting form non-local mem-
ory accesses with the computationtime in parallel loops
toeffectivelyhidethelatencyoftheremoteaccesses. The
transformationpeels from a parallel loop iterations that
access remote data and re-schedules them after the ex-
ecution of iterations that access only local data (local-
only iterations). Asynchronous prefetching of remote
data is used to overlap non-localaccess latency with the
execution of local-only iterations. Experimental eval-
|