Critical Points Based Register-Concurrency Autotuning for GPUs

Li, Ang; Song, Shuaiwen; Kumar, Akash; Zhang, Eddy; Chavarría-Miranda, Daniel; Corporaal, Henk

Title: Critical Points Based Register-Concurrency Autotuning for GPUs

Conference · Mon Mar 14 00:00:00 EDT 2016

OSTI ID:1253875

Li, Ang; Song, Shuaiwen; Kumar, Akash; Zhang, Eddy; Chavarría-Miranda, Daniel; Corporaal, Henk

The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resources, which allow massively concurrent threads and extremely fast context switch. However, due to internal memory capacity constraints, there is a tradeoff between the per-thread register usage and the overall concurrency. This becomes a design problem in terms of performance tuning, since the performance “sweet spot” which can be significantly affected by these two factors is generally unknown beforehand. In this paper, we propose an effective autotuning solution to quickly and efficiently select the optimal number of registers perthread for delivering the best GPU performance. Experiments on three generations of GPUs (Nvidia Fermi, Kepler and Maxwell) demonstrate that our simple strategy can achieve an average of 10% performance improvement while a max of 50% over the original version without modifying the user program. Additionally, to reduce local cache misses due to register spilling and further improve performance, we explore three optimization schemes (i.e. bypass L1 for global memory access, enlarge local L1 cache and spill into shared memory) and discuss their impact on performance on a Kepler GPU.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1253875

Report Number(s):: PNNL-SA-114732; 400470000

Resource Relation:: Conference: Proceedings of the Design, Automation and Test in Europe Conference (DATE 2016), March 14-18, 2016, Dresden, Germany, 1273-1278

Country of Publication:: United States

Language:: English

Similar Records

RACB: Resource Aware Cache Bypass on GPUs

Conference · Wed Oct 01 00:00:00 EDT 2014 · 2014 International Symposium on Computer Architecture and High Performance Computing Workshop; 22-24 Oct. 2014; Paris, France · OSTI ID:1253875

Dai, Hongwen; Kartsaklis, Christos; Li, Chao; +2 more

A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

Conference · Sat Feb 16 00:00:00 EST 2019 · OSTI ID:1253875

Meng, Ke; Li, Jiajia; Tan, Guangming; +1 more

Effcient GPU Implementation of Automatic Differentiation for Computational Fluid Dynamics

Conference · Mon Jul 24 00:00:00 EDT 2023 · OSTI ID:1253875

Zubair, Mohammad; Ranjan, Desh; Walden, Aaron; +6 more

Related Subjects

GPUs
register concurrency
auto-tuning

Title: Critical Points Based Register-Concurrency Autotuning for GPUs

Citation Formats

Similar Records

Related Subjects