Extreme-bandwidth scalable performance-per-watt GPU architecture
Abstract
A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1600412
- Patent Number(s):
- 10509596
- Application Number:
- 15/851,476
- Assignee:
- Advanced Micro Devices, Inc. (Santa Clara, CA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06T - IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC52-07NA27344; B620717
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 12/21/2017
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Yudanov, Dmitri, and Chen, Jiasheng. Extreme-bandwidth scalable performance-per-watt GPU architecture. United States: N. p., 2019.
Web.
Yudanov, Dmitri, & Chen, Jiasheng. Extreme-bandwidth scalable performance-per-watt GPU architecture. United States.
Yudanov, Dmitri, and Chen, Jiasheng. Tue .
"Extreme-bandwidth scalable performance-per-watt GPU architecture". United States. https://www.osti.gov/servlets/purl/1600412.
@article{osti_1600412,
title = {Extreme-bandwidth scalable performance-per-watt GPU architecture},
author = {Yudanov, Dmitri and Chen, Jiasheng},
abstractNote = {A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {12}
}