Extreme-bandwidth scalable performance-per-watt GPU architecture

Yudanov, Dmitri; Chen, Jiasheng

Advanced Search OptionsAdvanced Search queries use a traditional Term Search. For more info, see our FAQ.

All Fields:

Patent Title:

Abstract:

Assignee:

Inventor(s):

Patent Number:

Patent Classification (CPC):

All Classifications
A - human necessities
A01 - agriculture
A21 - baking
A22 - butchering
A23 - foods or foodstuffs
A24 - tobacco
A41 - wearing apparel
A42 - headwear
A43 - footwear
A44 - haberdashery
A45 - hand or travelling articles
A46 - brushware
A47 - furniture
A61 - medical or veterinary science
A62 - life-saving
A63 - sports
A99 - subject matter not otherwise provided for in this section
B - performing operations
B01 - physical or chemical processes or apparatus in general
B02 - crushing, pulverising, or disintegrating
B03 - separation of solid materials using liquids or using pneumatic tables or jigs
B04 - centrifugal apparatus or machines for carrying-out physical or chemical processes
B05 - spraying or atomising in general
B06 - generating or transmitting mechanical vibrations in general
B07 - separating solids from solids
B08 - cleaning
B09 - disposal of solid waste
B21 - mechanical metal-working without essentially removing material
B22 - casting
B23 - machine tools
B24 - grinding
B25 - hand tools
B26 - hand cutting tools
B27 - working or preserving wood or similar material
B28 - working cement, clay, or stone
B29 - working of plastics
B30 - presses
B31 - making articles of paper, cardboard or material worked in a manner analogous to paper
B32 - layered products
B33 - additive manufacturing technology
B41 - printing
B42 - bookbinding
B43 - writing or drawing implements
B44 - decorative arts
B60 - vehicles in general
B61 - railways
B62 - land vehicles for travelling otherwise than on rails
B63 - ships or other waterborne vessels
B64 - aircraft
B65 - conveying
B66 - hoisting
B67 - opening, closing {or cleaning} bottles, jars or similar containers
B68 - saddlery
B81 - microstructural technology
B82 - nanotechnology
B99 - subject matter not otherwise provided for in this section
C - chemistry
C01 - inorganic chemistry
C02 - treatment of water, waste water, sewage, or sludge
C03 - glass
C04 - cements
C05 - fertilisers
C06 - explosives
C07 - organic chemistry
C08 - organic macromolecular compounds
C09 - dyes
C10 - petroleum, gas or coke industries
C11 - animal or vegetable oils, fats, fatty substances or waxes
C12 - biochemistry
C13 - sugar industry
C14 - skins
C21 - metallurgy of iron
C22 - metallurgy
C23 - coating metallic material
C25 - electrolytic or electrophoretic processes
C30 - crystal growth
C40 - combinatorial technology
C99 - subject matter not otherwise provided for in this section
D - textiles
D01 - natural or man-made threads or fibres
D02 - yarns
D03 - weaving
D04 - braiding
D05 - sewing
D06 - treatment of textiles or the like
D07 - ropes
D10 - indexing scheme associated with sublasses of section d, relating to textiles
D21 - paper-making
D99 - subject matter not otherwise provided for in this section
E - fixed constructions
E01 - construction of roads, railways, or bridges
E02 - hydraulic engineering
E03 - water supply
E04 - building
E05 - locks
E06 - doors, windows, shutters, or roller blinds in general
E21 - earth drilling
E99 - subject matter not otherwise provided for in this section
F - mechanical engineering
F01 - machines or engines in general
F02 - combustion engines
F03 - machines or engines for liquids
F04 - positive - displacement machines for liquids
F05 - indexing schemes relating to engines or pumps in various subclasses of classes f01-f04
F15 - fluid-pressure actuators
F16 - engineering elements and units
F17 - storing or distributing gases or liquids
F21 - lighting
F22 - steam generation
F23 - combustion apparatus
F24 - heating
F25 - refrigeration or cooling
F26 - drying
F27 - furnaces
F28 - heat exchange in general
F41 - weapons
F42 - ammunition
F99 - subject matter not otherwise provided for in this section
G - physics
G01 - measuring
G02 - optics
G03 - photography
G04 - horology
G05 - controlling
G06 - computing
G07 - checking-devices
G08 - signalling
G09 - education
G10 - musical instruments
G11 - information storage
G12 - instrument details
G16 - information and communication technology [ict] specially adapted for specific application fields
G21 - nuclear physics
G99 - subject matter not otherwise provided for in this section
H - electricity
H01 - basic electric elements
H02 - generation
H03 - basic electronic circuitry
H04 - electric communication technique
H05 - electric techniques not otherwise provided for
H99 - subject matter not otherwise provided for in this section
Y - new / cross sectional technologies
Y02 - technologies or applications for mitigation or adaptation against climate change
Y04 - information or communication technologies having an impact on other technology areas
Y10 - technical subjects covered by former uspc

More Options ...

Title: Extreme-bandwidth scalable performance-per-watt GPU architecture

Abstract

A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.

Inventors:: Yudanov, Dmitri; Chen, Jiasheng

Issue Date:: Tue Dec 17 00:00:00 EST 2019

Research Org.:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Org.:: USDOE

OSTI Identifier:: 1600412

Patent Number(s):: 10509596

Application Number:: 15/851,476

Assignee:: Advanced Micro Devices, Inc. (Santa Clara, CA)

Patent Classifications (CPCs):: G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING

G - PHYSICS G06 - COMPUTING G06T - IMAGE DATA PROCESSING OR GENERATION, IN GENERAL

Show more

G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
G06F1/3225 - of memory devices
G06F1/3275 - {Power saving in memory, e.g. RAM, cache}
G06F11/3037 - {where the computing system component is a memory, e.g. virtual memory, cache
G06F11/3062 - {where the monitored property is the power consumption
G06F12/0607 - {Interleaved addressing}
G06F12/0813 - with a network or matrix configuration
G06F2201/81 - Threshold
G06F3/0604 - {Improving or facilitating administration, e.g. storage management}
G06F3/0659 - {Command handling arrangements, e.g. command buffers, queues, command scheduling}
G06F3/0679 - {Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]}
G06F9/3887 - {controlled by a single instruction, e.g. SIMD}

G - PHYSICS G06 - COMPUTING G06T - IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
G06T1/20 - Processor architectures
G06T1/60 - Memory management

Y - NEW / CROSS SECTIONAL TECHNOLOGIES Y02 - TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE Y02D - CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THIR OWN ENERGY USE
Y02D10/00 - Energy efficient computing

Show less

DOE Contract Number:: AC52-07NA27344; B620717

Resource Type:: Patent

Resource Relation:: Patent File Date: 12/21/2017

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING

Citation Formats


                    Yudanov, Dmitri, and Chen, Jiasheng. Extreme-bandwidth scalable performance-per-watt GPU architecture.  United States: N. p., 2019. 
        Web.

Copy to clipboard


                    Yudanov, Dmitri, & Chen, Jiasheng. Extreme-bandwidth scalable performance-per-watt GPU architecture.  United States.

Copy to clipboard


                    Yudanov, Dmitri, and Chen, Jiasheng. Tue .  
        "Extreme-bandwidth scalable performance-per-watt GPU architecture".  United States.  https://www.osti.gov/servlets/purl/1600412.

Copy to clipboard


                    
@article{osti_1600412,

  title        = {Extreme-bandwidth scalable performance-per-watt GPU architecture},

  author       = {Yudanov, Dmitri and Chen, Jiasheng},

  abstractNote = {A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.},

  doi          = {},

  journal      = {},
number       = ,

  volume       = ,

  place        = {United States},

  year         = {Tue Dec 17 00:00:00 EST 2019},

  month        = {Tue Dec 17 00:00:00 EST 2019}

}

Copy to clipboard

Patent:

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Three-Dimensional Chip-Based Regular Expression Scanner
patent-application, March 2017

Van Lunteren, Jan; Coghlan, James; Joseph, Douglas J.
US Patent Application 14/841825; 20170061304
URL: https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20170061304

Harmonica: An FPGA-Based Data Parallel Soft Core
conference, May 2014

Kersey, Chad; Yalamanchili, Sudhakar; Kim, Hyojong
2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines
https://doi.org/10.1109/FCCM.2014.53

Exploring DRAM organizations for energy-efficient and resilient exascale memories
conference, November 2013

Giridhar, Bharan; Cieslak, Michael; Duggal, Deepankar
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
https://doi.org/10.1145/2503210.2503215

3D-Integrated SRAM Components for High-Performance Microprocessors
journal, October 2009

Puttaswamy, Kiran; Loh, Gabriel H.
IEEE Transactions on Computers, Vol. 58, Issue 10
https://doi.org/10.1109/TC.2009.92

A case for exploiting subarray-level parallelism (SALP) in DRAM
conference, June 2012

Kim, Yoongu; Seshadri, Vivek; Lee, Donghyuk
2012 39th Annual International Symposium on Computer Architecture (ISCA)
https://doi.org/10.1109/ISCA.2012.6237032

Similar Records in DOE Patents and OSTI.GOV collections:

Scalable quantum computer architecture with coupled donor-quantum dot qubits

Patent Schenkel, Thomas; Lo, Cheuk Chi; Weis, Christoph; ...

A quantum bit computing architecture includes a plurality of single spin memory donor atoms embedded in a semiconductor layer, a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, wherein a first voltage applied across at least one pair of the aligned quantum dot and donor atom controls a donor-quantum dot coupling. A method of performing quantum computing in a scalable architecture quantum computing apparatus includes arranging a pattern of single spin memory donor atoms in a semiconductor layer, forming a plurality of quantum dots arranged with the semiconductor layer and aligned with themore » « less
Full Text Available
Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems

Patent Yu, Dantong; Katramatos, Dimitrios; Sim, Alexander; ...

A cross-domain network resource reservation scheduler configured to schedule a path from at least one end-site includes a management plane device configured to monitor and provide information representing at least one of functionality, performance, faults, and fault recovery associated with a network resource; a control plane device configured to at least one of schedule the network resource, provision local area network quality of service, provision local area network bandwidth, and provision wide area network bandwidth; and a service plane device configured to interface with the control plane device to reserve the network resource based on a reservation request and themore » « less
Full Text Available
Performance for GPU exceptions

Patent Gutierrez, Anthony T.

Techniques for improving performance of accelerated processing devices (“APDs”) when exceptions occur are provided. In APDs, the very large number of parallel processing execution units, and the complexity of the hardware used to execute a large number of work-items in parallel, means that APDs typically stall when an exception occurs (unlike in central processing units (“CPUs”), which are able to execute speculatively and out-of-order). However, the techniques provided herein allow at least some execution to occur past exceptions. Execution past an exception generating instruction occurs by executing instructions that would not lead to a corruption while skipping those that wouldmore » « less
Full Text Available
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Patent Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation ismore » « less
Full Text Available
Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Patent Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial productmore » « less
Full Text Available

Similar Records

Title: Extreme-bandwidth scalable performance-per-watt GPU architecture

Abstract

Citation Formats

Three-Dimensional Chip-Based Regular Expression Scanner patent-application, March 2017

Harmonica: An FPGA-Based Data Parallel Soft Core conference, May 2014

Exploring DRAM organizations for energy-efficient and resilient exascale memories conference, November 2013

3D-Integrated SRAM Components for High-Performance Microprocessors journal, October 2009

A case for exploiting subarray-level parallelism (SALP) in DRAM conference, June 2012

Three-Dimensional Chip-Based Regular Expression Scanner
patent-application, March 2017

Harmonica: An FPGA-Based Data Parallel Soft Core
conference, May 2014

Exploring DRAM organizations for energy-efficient and resilient exascale memories
conference, November 2013

3D-Integrated SRAM Components for High-Performance Microprocessors
journal, October 2009

A case for exploiting subarray-level parallelism (SALP) in DRAM
conference, June 2012