Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Locality-Aware CTA Clustering For Modern GPUs

Conference ·

In this paper, we proposed a novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-CTA locality. We first demonstrated the capability of the existing GPU hardware to exploit such locality, both spatially and temporally, on L1 or L1/Tex unified cache. To verify the potential of this locality, we quantified its existence in a broad spectrum of applications and discussed its sources of origin. Based on these insights, we proposed the concept of CTA-Clustering and its associated software techniques. Finally, We evaluated these techniques on all modern generations of NVIDIA GPU architectures. The experimental results showed that our proposed clustering techniques could significantly improve on-chip cache performance.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1355097
Report Number(s):
PNNL-SA-123050; KJ0402000
Country of Publication:
United States
Language:
English

Similar Records

Locality-Driven Dynamic GPU Cache Bypassing
Conference · Sun Jun 07 00:00:00 EDT 2015 · OSTI ID:1194296

RACB: Resource Aware Cache Bypass on GPUs
Conference · Wed Oct 01 00:00:00 EDT 2014 · 2014 International Symposium on Computer Architecture and High Performance Computing Workshop; 22-24 Oct. 2014; Paris, France · OSTI ID:1567596

Critical Points Based Register-Concurrency Autotuning for GPUs
Conference · Mon Mar 14 00:00:00 EDT 2016 · OSTI ID:1253875