Tolerating memory stack failures in multi-stack systems
Abstract
Memory management circuitry and processes operate to improve reliability of a group of memory stacks, providing that if a memory stack or a portion thereof fails during the product's lifetime, the system may still recover with no errors or data loss. A front-end controller receives a block of data requested to be written to memory, divides the block into sub-blocks, and creates a new redundant reliability sub-block. The sub-blocks are then written to different memory stacks. When reading data from the memory stacks, the front-end controller detects errors indicating a failure within one of the memory stacks, and recovers corrected data using the reliability sub-block. The front-end controller may monitor errors for signs of a stack failure and disable the failed stack.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Advanced Micro Devices, Inc., Santa Clara, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1986787
- Patent Number(s):
- 11494087
- Application Number:
- 16/175,926
- Assignee:
- Advanced Micro Devices, Inc. (Santa Clara, CA)
- DOE Contract Number:
- AC52-07NA27344; B620717
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 10/31/2018
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Mappouras, Georgios, Farahani, Amin Farmahini, and Ignatowski, Michael. Tolerating memory stack failures in multi-stack systems. United States: N. p., 2022.
Web.
Mappouras, Georgios, Farahani, Amin Farmahini, & Ignatowski, Michael. Tolerating memory stack failures in multi-stack systems. United States.
Mappouras, Georgios, Farahani, Amin Farmahini, and Ignatowski, Michael. Tue .
"Tolerating memory stack failures in multi-stack systems". United States. https://www.osti.gov/servlets/purl/1986787.
@article{osti_1986787,
title = {Tolerating memory stack failures in multi-stack systems},
author = {Mappouras, Georgios and Farahani, Amin Farmahini and Ignatowski, Michael},
abstractNote = {Memory management circuitry and processes operate to improve reliability of a group of memory stacks, providing that if a memory stack or a portion thereof fails during the product's lifetime, the system may still recover with no errors or data loss. A front-end controller receives a block of data requested to be written to memory, divides the block into sub-blocks, and creates a new redundant reliability sub-block. The sub-blocks are then written to different memory stacks. When reading data from the memory stacks, the front-end controller detects errors indicating a failure within one of the memory stacks, and recovers corrected data using the reliability sub-block. The front-end controller may monitor errors for signs of a stack failure and disable the failed stack.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2022},
month = {11}
}
Works referenced in this record:
Jenga: Efficient Fault Tolerance for Stacked DRAM
conference, November 2017
- Mappouras, Georgios; Vahid, Alireza; Calderbank, Robert
- 2017 IEEE International Conference on Computer Design (ICCD)
Parity Helix: Efficient protection for single-dimensional faults in multi-dimensional memory systems
conference, March 2016
- Jian, Xun; Sridharan, Vilas; Kumar, Rakesh
- 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Efficient RAS support for die-stacked DRAM
conference, October 2014
- Jeon, Hyeran; Loh, Gabriel H.; Annavaram, Murali
- 2014 International Test Conference
Memory System
patent-application, September 2006
- Porat, Ofer; Campbell, Brian K.; Magnuson, Brian D.
- US Patent Application 11/083571; 20060212622
Method of Storing Blocks of Data in a Plurality of Memory Devices in a Redundant Manner, A Memory Controller and a Memory System
patent-application, May 2012
- Arya, Siamak
- US Patent Application 12/941926; 20120117444
Increased Redundancy in Multi-Device Memory Package to Improve Reliability
patent-application, May 2018
- Wu, Wei; Kang, Uksong; Alameer, Hussein
- US Patent Application 15/814336; 20180137005
Bridge interfacing two processing sets operating in a lockstep mode and having a posted write buffer storing write operations upon detection of a lockstep error
patent, November 2000
- Garnett, Paul J.; Rowlinson, Stephen; Oyelakin, Femi A.
- US Patent Document 6,148,348
High Reliability Memory Controller
patent-application, April 2014
- Loh, Gabriel H.; Sridharan, Vilas K.
- US Patent Application 13/649745; 20140108885
Integral Post Package Repair
patent-application, November 2017
- Brandl, Kevin M.
- US Patent Application 15/168045; 20170344421
A case for redundant arrays of inexpensive disks (RAID)
journal, June 1988
- Patterson, David A.; Gibson, Garth; Katz, Randy H.
- ACM SIGMOD Record, Vol. 17, Issue 3
Ratt-Ecc
journal, September 2016
- Chen, Hsing-Min; Wu, Carole-Jean; Mudge, Trevor
- ACM Transactions on Architecture and Code Optimization, Vol. 13, Issue 3
High reliability memory controller
patent, March 2015
- Loh, Gabriel H.; Sridharan, Vilas
- US Patent Document 8,984,368