DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tolerating memory stack failures in multi-stack systems

Abstract

Memory management circuitry and processes operate to improve reliability of a group of memory stacks, providing that if a memory stack or a portion thereof fails during the product's lifetime, the system may still recover with no errors or data loss. A front-end controller receives a block of data requested to be written to memory, divides the block into sub-blocks, and creates a new redundant reliability sub-block. The sub-blocks are then written to different memory stacks. When reading data from the memory stacks, the front-end controller detects errors indicating a failure within one of the memory stacks, and recovers corrected data using the reliability sub-block. The front-end controller may monitor errors for signs of a stack failure and disable the failed stack.

Inventors:
; ;
Issue Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Advanced Micro Devices, Inc., Santa Clara, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1986787
Patent Number(s):
11494087
Application Number:
16/175,926
Assignee:
Advanced Micro Devices, Inc. (Santa Clara, CA)
DOE Contract Number:  
AC52-07NA27344; B620717
Resource Type:
Patent
Resource Relation:
Patent File Date: 10/31/2018
Country of Publication:
United States
Language:
English

Citation Formats

Mappouras, Georgios, Farahani, Amin Farmahini, and Ignatowski, Michael. Tolerating memory stack failures in multi-stack systems. United States: N. p., 2022. Web.
Mappouras, Georgios, Farahani, Amin Farmahini, & Ignatowski, Michael. Tolerating memory stack failures in multi-stack systems. United States.
Mappouras, Georgios, Farahani, Amin Farmahini, and Ignatowski, Michael. Tue . "Tolerating memory stack failures in multi-stack systems". United States. https://www.osti.gov/servlets/purl/1986787.
@article{osti_1986787,
title = {Tolerating memory stack failures in multi-stack systems},
author = {Mappouras, Georgios and Farahani, Amin Farmahini and Ignatowski, Michael},
abstractNote = {Memory management circuitry and processes operate to improve reliability of a group of memory stacks, providing that if a memory stack or a portion thereof fails during the product's lifetime, the system may still recover with no errors or data loss. A front-end controller receives a block of data requested to be written to memory, divides the block into sub-blocks, and creates a new redundant reliability sub-block. The sub-blocks are then written to different memory stacks. When reading data from the memory stacks, the front-end controller detects errors indicating a failure within one of the memory stacks, and recovers corrected data using the reliability sub-block. The front-end controller may monitor errors for signs of a stack failure and disable the failed stack.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2022},
month = {11}
}

Works referenced in this record:

Jenga: Efficient Fault Tolerance for Stacked DRAM
conference, November 2017


Parity Helix: Efficient protection for single-dimensional faults in multi-dimensional memory systems
conference, March 2016


Efficient RAS support for die-stacked DRAM
conference, October 2014


Memory System
patent-application, September 2006


Increased Redundancy in Multi-Device Memory Package to Improve Reliability
patent-application, May 2018


High Reliability Memory Controller
patent-application, April 2014


Integral Post Package Repair
patent-application, November 2017


A case for redundant arrays of inexpensive disks (RAID)
journal, June 1988


Ratt-Ecc
journal, September 2016


High reliability memory controller
patent, March 2015