DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Extreme-bandwidth scalable performance-per-watt GPU architecture

Abstract

A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.

Inventors:
;
Issue Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1600412
Patent Number(s):
10509596
Application Number:
15/851,476
Assignee:
Advanced Micro Devices, Inc. (Santa Clara, CA)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
G - PHYSICS G06 - COMPUTING G06T - IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
DOE Contract Number:  
AC52-07NA27344; B620717
Resource Type:
Patent
Resource Relation:
Patent File Date: 12/21/2017
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Yudanov, Dmitri, and Chen, Jiasheng. Extreme-bandwidth scalable performance-per-watt GPU architecture. United States: N. p., 2019. Web.
Yudanov, Dmitri, & Chen, Jiasheng. Extreme-bandwidth scalable performance-per-watt GPU architecture. United States.
Yudanov, Dmitri, and Chen, Jiasheng. Tue . "Extreme-bandwidth scalable performance-per-watt GPU architecture". United States. https://www.osti.gov/servlets/purl/1600412.
@article{osti_1600412,
title = {Extreme-bandwidth scalable performance-per-watt GPU architecture},
author = {Yudanov, Dmitri and Chen, Jiasheng},
abstractNote = {A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Dec 17 00:00:00 EST 2019},
month = {Tue Dec 17 00:00:00 EST 2019}
}

Works referenced in this record:

Three-Dimensional Chip-Based Regular Expression Scanner
patent-application, March 2017


Harmonica: An FPGA-Based Data Parallel Soft Core
conference, May 2014

  • Kersey, Chad; Yalamanchili, Sudhakar; Kim, Hyojong
  • 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines
  • https://doi.org/10.1109/FCCM.2014.53

Exploring DRAM organizations for energy-efficient and resilient exascale memories
conference, November 2013

  • Giridhar, Bharan; Cieslak, Michael; Duggal, Deepankar
  • Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1145/2503210.2503215

3D-Integrated SRAM Components for High-Performance Microprocessors
journal, October 2009


A case for exploiting subarray-level parallelism (SALP) in DRAM
conference, June 2012