Locality-Aware CTA Clustering For Modern GPUs
In this paper, we proposed a novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-CTA locality. We first demonstrated the capability of the existing GPU hardware to exploit such locality, both spatially and temporally, on L1 or L1/Tex unified cache. To verify the potential of this locality, we quantified its existence in a broad spectrum of applications and discussed its sources of origin. Based on these insights, we proposed the concept of CTA-Clustering and its associated software techniques. Finally, We evaluated these techniques on all modern generations of NVIDIA GPU architectures. The experimental results showed that our proposed clustering techniques could significantly improve on-chip cache performance.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1355097
- Report Number(s):
- PNNL-SA-123050; KJ0402000
- Country of Publication:
- United States
- Language:
- English
Similar Records
Locality-Driven Dynamic GPU Cache Bypassing
RACB: Resource Aware Cache Bypass on GPUs
Critical Points Based Register-Concurrency Autotuning for GPUs
Conference
·
Sun Jun 07 00:00:00 EDT 2015
·
OSTI ID:1194296
RACB: Resource Aware Cache Bypass on GPUs
Conference
·
Wed Oct 01 00:00:00 EDT 2014
· 2014 International Symposium on Computer Architecture and High Performance Computing Workshop; 22-24 Oct. 2014; Paris, France
·
OSTI ID:1567596
Critical Points Based Register-Concurrency Autotuning for GPUs
Conference
·
Mon Mar 14 00:00:00 EDT 2016
·
OSTI ID:1253875