Locality-Driven Dynamic GPU Cache Bypassing
This paper presents novel cache optimizations for massively parallel, throughput-oriented architectures like GPUs. Based on the reuse characteristics of GPU workloads, we propose a design that integrates such efficient locality filtering capability into the decoupled tag store of the existing L1 D-cache through simple and cost-effective hardware extensions.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Resource Relation:
- Conference: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS 2015), June 8-11, 2015, Newport Beach, California, 66-77
- ACM , New York, NY, United States(US).
- Research Org:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Org:
- Country of Publication:
- United States
- architecture optimization; reuse; performance; energy; locality; cache