| | |
Summary: Design and Implementation of an EÆcient
Thread Partitioning Algorithm
Jose Nelson Amaral, Guang Gao, Erturk Dogan Kocalar,
Patrick O'Neill, Xinan Tang
Computer Architecture and Parallel Systems Laboratory,
University of Delaware, Newark, DE, USA, http://www.capsl.udel.edu
Dep. of Comp. Science, Univ. of Alberta, Canada, http://www.cs.ualberta.ca
Abstract. The development of ne-grain multi-threaded program ex-
ecution models has created an interesting challenge: how to partition
a program into threads that can exploit machine parallelism, achieve
latency tolerance, and maintain reasonable locality of reference? A suc-
cessful algorithm must produce a thread partition that best utilizes mul-
tiple execution units on a single processing node and handles long and
unpredictable latencies.
In this paper, we introduce a new thread partitioning algorithm that can
meet the above challenge for a range of machine architecture models. A
quantitative aÆnity heuristic is introduced to guide the placement of
operations into threads. This heuristic addresses the trade-o between
exploiting parallelism and preserving locality. The algorithm is surpris-
ingly simple due to the use of a time-ordered event list to account for the
|