skip to main content

Title: Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

Summary of technical results of Blackcomb Memory Devices We explored various different memory technologies (STTRAM, PCRAM, FeRAM, and ReRAM). The progress can be classified into three categories, below. Modeling and Tool Releases Various modeling tools have been developed over the last decade to help in the design of SRAM or DRAM-based memory hierarchies. To explore new design opportunities that NVM technologies can bring to the designers, we have developed similar high-level models for NVM, including PCRAMsim [Dong 2009], NVSim [Dong 2012], and NVMain [Poremba 2012]. NVSim is a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. On the other side, NVMain is a cycle accurate main memory simulator designed to simulate emerging nonvolatile memories at the architectural level. We have released these models as open source tools and provided contiguous support to them. We also proposed PS3-RAM, which is a fast, portable and scalable statistical STT-RAM reliability analysis model [Wen 2012]. Design Space Exploration and Optimization With the support of these models, we explore different device/circuit optimization techniques.more » For example, in [Niu 2012a] we studied the power reduction technique for the application of ECC scheme in ReRAM designs and proposed to use ECC code to relax the BER (Bit Error Rate) requirement of a single memory to improve the write energy consumption and latency for both 1T1R and cross-point ReRAM designs. In [Xu 2011], we proposed a methodology to design STT-RAM for different optimization goals such as read performance, write performance and write energy by leveraging the trade-off between write current and write time of MTJ. We also studied the tradeoffs in building a reliable crosspoint ReRAM array [Niu 2012b]. We have conducted an in depth analysis of the circuit and system level design implications of multi-layer cross-point Resistive RAM (MLCReRAM) from performance, power and reliability perspectives [Xu 2013]. The objective of this study is to understand the design trade-offs of this technology with respect to the MLC Phase Change Memory (MLCPCM).Our MLC ReRAM design at the circuit and system levels indicates that different resistance allocation schemes, programming strategies, peripheral designs, and material selections profoundly affect the area, latency, power, and reliability of MLC ReRAM. Based on this analysis, we conduct two case studies: first we compare MLC ReRAM design against MLC phase-change memory (PCM) and multi-layer cross-point ReRAM design, and point out why multi-level ReRAM is appealing; second we further explore the design space for MLC ReRAM. Architecture and Application We explored hybrid checkpointing using phase-change memory for future exascale systems [Dong 2011] and showed that the use of nonvolatile memory for local checkpointing significantly increases the number of faults covered by local checkpoints and reduces the probability of a global failure in the middle of a global checkpoint to less than 1%. We also proposed a technique called i2WAP to mitigate the write variations in NVM-based last-level cache for the improvement of the NVM lifetime [Wang 2013]. Our wear leveling technique attempts to work around the limitations of write endurance by arranging data access so that write operations can be distributed evenly across all the storage cells. During our intensive research on fault-tolerant NVM design, we found that ECC cannot effectively tolerate hard errors from limited write endurance and process imperfection. Therefore, we devised a novel Point and Discard (PAD) architecture in in [ 2012] as a hard-error-tolerant architecture for ReRAM-based Last Level Caches. PAD improves the lifetime of ReRAM caches by 1.6X-440X under different process variations without performance overhead in the system's early life. We have investigated the applicability of NVM for persistent memory design [Zhao 2013]. New byte addressable NVM enables fast persistent memory that allows in-memory persistent data objects to be updated with much higher throughput. Despite the significant improvement, the performance of these designs is only 50% of the native system with no persistence support, due to the logging or copy-on-write mechanisms used to update the persistent memory. A challenge in this approach is therefore how to efficiently enable atomic, consistent, and durable updates to ensure data persistence that survives application and/or system failures. We have designed a persistent memory system, called Klin, that can provide performance as close as that of the native system. The Klin design adopts a non-volatile cache and a non-volatile main memory for constructing a multi-versioned durable memory system, enabling atomic updates without logging or copy-on-write. Our evaluation shows that the proposed Kiln mechanism can achieve up to 2X of performance improvement to NVRAM-based persistent memory employing write-ahead logging. In addition, our design has numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and no redundant writes to slow non-volatile storage media. The work was published in MICRO 2013 and received Best Paper Honorable Mentioned Award.« less
  1. Hewlett Packard Laboratories, Palo Alto, CA (United States)
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Technical Report
Research Org:
Hewlett Packard Laboratories, Palo Alto, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
Country of Publication:
United States