| | |
Summary: Hardware Transactional Memory for GPU Architectures
Wilson W. L. Fung
Inderpreet Singh
Andrew Brownsword Tor M. Aamodt
Department of Computer and Electrical Engineering
University of British Columbia
wwlfung@ece.ubc.ca isingh@ece.ubc.ca
andrew@brownsword.ca aamodt@ece.ubc.ca
ABSTRACT
Graphics processor units (GPUs) are designed to efficiently exploit
thread level parallelism (TLP), multiplexing execution of 1000s of
concurrent threads on a relatively smaller set of single-instruction,
multiple-thread (SIMT) cores to hide various long latency opera-
tions. While threads within a CUDA block/OpenCL workgroup can
communicate efficiently through an intra-core scratchpad memory,
threads in different blocks can only communicate via global mem-
ory accesses. Programmers wishing to exploit such communication
have to consider data-races that may occur when multiple threads
modify the same memory location. Recent GPUs provide a form
|