Method and apparatus for selective and power-aware memory error protection and memory management
Abstract
A method for providing selective memory error protection responsive to a predictable failure notification associated with at least one portion of a memory in a computing system includes: obtaining an active error correcting code (ECC) configuration corresponding to the portion of the memory; determining whether the active ECC configuration is sufficient to correct at least one error in the portion of the memory affected by the predictable failure notification; when the active ECC configuration is insufficient to correct the error, determining whether data corruption can be tolerated by an application running on the computing system; when data corruption cannot be tolerated by the application, determining whether a stronger ECC level is available and, if a stronger ECC level is available, increasing a strength of the active ECC configuration; and when data corruption can be tolerated, performing page reassignment and aggregation of non-critical data.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1495044
- Patent Number(s):
- 10141955
- Application Number:
- 14/684,368
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H03 - BASIC ELECTRONIC CIRCUITRY H03M - CODING
- DOE Contract Number:
- B599858
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2015 Apr 11
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Andrade Costa, Carlos H., Cher, Chen-Yong, Park, Yoonho, Rosenburg, Bryan S., and Ryu, Kyung D. Method and apparatus for selective and power-aware memory error protection and memory management. United States: N. p., 2018.
Web.
Andrade Costa, Carlos H., Cher, Chen-Yong, Park, Yoonho, Rosenburg, Bryan S., & Ryu, Kyung D. Method and apparatus for selective and power-aware memory error protection and memory management. United States.
Andrade Costa, Carlos H., Cher, Chen-Yong, Park, Yoonho, Rosenburg, Bryan S., and Ryu, Kyung D. Tue .
"Method and apparatus for selective and power-aware memory error protection and memory management". United States. https://www.osti.gov/servlets/purl/1495044.
@article{osti_1495044,
title = {Method and apparatus for selective and power-aware memory error protection and memory management},
author = {Andrade Costa, Carlos H. and Cher, Chen-Yong and Park, Yoonho and Rosenburg, Bryan S. and Ryu, Kyung D.},
abstractNote = {A method for providing selective memory error protection responsive to a predictable failure notification associated with at least one portion of a memory in a computing system includes: obtaining an active error correcting code (ECC) configuration corresponding to the portion of the memory; determining whether the active ECC configuration is sufficient to correct at least one error in the portion of the memory affected by the predictable failure notification; when the active ECC configuration is insufficient to correct the error, determining whether data corruption can be tolerated by an application running on the computing system; when data corruption cannot be tolerated by the application, determining whether a stronger ECC level is available and, if a stronger ECC level is available, increasing a strength of the active ECC configuration; and when data corruption can be tolerated, performing page reassignment and aggregation of non-critical data.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {11}
}
Works referenced in this record:
Measurement-based analysis of fault and error sensitivities of dynamic memory
conference, June 2010
- Yim, Keun Soo; Kalbarczyk, Zbigniew; Iyer, Ravishankar K.
- 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN)
Exploring event correlation for failure prediction in coalitions of clusters
conference, January 2007
- Fu, Song; Xu, Cheng-Zhong
- Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
System and method for exchanging data
patent, June 2004
- Darcy, Paul B.; DeLuca, Steven A.
- US Patent Document 6,748,445
Flikker: saving DRAM refresh-power through critical data partitioning
conference, January 2011
- Liu, Song; Pattabiraman, Karthik; Moscibroda, Thomas
- Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11
Energy-efficient cache design using variable-strength error-correcting codes
conference, January 2011
- Alameldeen, Alaa R.; Wagner, Ilya; Chishti, Zeshan
- Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11
SECRET: Selective error correction for refresh energy reduction in DRAMs
conference, September 2012
- Lin, Chung-Hsiang; Shen, De-Yu; Chen, Yi-Jung
- 2012 IEEE 30th International Conference on Computer Design (ICCD 2012), 2012 IEEE 30th International Conference on Computer Design (ICCD)
SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs
conference, January 2005
- Feng Qin,
- 11th International Symposium on High-Performance Computer Architecture
Method and apparatus for using wear-out blocks in nonvolatile memory
patent, August 2016
- Song, Jong-uk
- US Patent Document 9,430,339
MAGE: Adaptive Granularity and ECC for resilient and power efficient memory systems
conference, November 2012
- Li, Sheng; Yoon, Doe Hyun; Chen, Ke
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis