Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

Schreiber, Robert

Title: Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

Technical Report · Wed Nov 26 00:00:00 EST 2014

OSTI ID:1164421

Schreiber, Robert ^[1]

Hewlett Packard Laboratories, Palo Alto, CA (United States)

Summary of technical results of Blackcomb Memory Devices We explored various different memory technologies (STTRAM, PCRAM, FeRAM, and ReRAM). The progress can be classified into three categories, below. Modeling and Tool Releases Various modeling tools have been developed over the last decade to help in the design of SRAM or DRAM-based memory hierarchies. To explore new design opportunities that NVM technologies can bring to the designers, we have developed similar high-level models for NVM, including PCRAMsim [Dong 2009], NVSim [Dong 2012], and NVMain [Poremba 2012]. NVSim is a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. On the other side, NVMain is a cycle accurate main memory simulator designed to simulate emerging nonvolatile memories at the architectural level. We have released these models as open source tools and provided contiguous support to them. We also proposed PS3-RAM, which is a fast, portable and scalable statistical STT-RAM reliability analysis model [Wen 2012]. Design Space Exploration and Optimization With the support of these models, we explore different device/circuit optimization techniques. For example, in [Niu 2012a] we studied the power reduction technique for the application of ECC scheme in ReRAM designs and proposed to use ECC code to relax the BER (Bit Error Rate) requirement of a single memory to improve the write energy consumption and latency for both 1T1R and cross-point ReRAM designs. In [Xu 2011], we proposed a methodology to design STT-RAM for different optimization goals such as read performance, write performance and write energy by leveraging the trade-off between write current and write time of MTJ. We also studied the tradeoffs in building a reliable crosspoint ReRAM array [Niu 2012b]. We have conducted an in depth analysis of the circuit and system level design implications of multi-layer cross-point Resistive RAM (MLCReRAM) from performance, power and reliability perspectives [Xu 2013]. The objective of this study is to understand the design trade-offs of this technology with respect to the MLC Phase Change Memory (MLCPCM).Our MLC ReRAM design at the circuit and system levels indicates that different resistance allocation schemes, programming strategies, peripheral designs, and material selections profoundly affect the area, latency, power, and reliability of MLC ReRAM. Based on this analysis, we conduct two case studies: first we compare MLC ReRAM design against MLC phase-change memory (PCM) and multi-layer cross-point ReRAM design, and point out why multi-level ReRAM is appealing; second we further explore the design space for MLC ReRAM. Architecture and Application We explored hybrid checkpointing using phase-change memory for future exascale systems [Dong 2011] and showed that the use of nonvolatile memory for local checkpointing significantly increases the number of faults covered by local checkpoints and reduces the probability of a global failure in the middle of a global checkpoint to less than 1%. We also proposed a technique called i2WAP to mitigate the write variations in NVM-based last-level cache for the improvement of the NVM lifetime [Wang 2013]. Our wear leveling technique attempts to work around the limitations of write endurance by arranging data access so that write operations can be distributed evenly across all the storage cells. During our intensive research on fault-tolerant NVM design, we found that ECC cannot effectively tolerate hard errors from limited write endurance and process imperfection. Therefore, we devised a novel Point and Discard (PAD) architecture in in [ 2012] as a hard-error-tolerant architecture for ReRAM-based Last Level Caches. PAD improves the lifetime of ReRAM caches by 1.6X-440X under different process variations without performance overhead in the system's early life. We have investigated the applicability of NVM for persistent memory design [Zhao 2013]. New byte addressable NVM enables fast persistent memory that allows in-memory persistent data objects to be updated with much higher throughput. Despite the significant improvement, the performance of these designs is only 50% of the native system with no persistence support, due to the logging or copy-on-write mechanisms used to update the persistent memory. A challenge in this approach is therefore how to efficiently enable atomic, consistent, and durable updates to ensure data persistence that survives application and/or system failures. We have designed a persistent memory system, called Klin, that can provide performance as close as that of the native system. The Klin design adopts a non-volatile cache and a non-volatile main memory for constructing a multi-versioned durable memory system, enabling atomic updates without logging or copy-on-write. Our evaluation shows that the proposed Kiln mechanism can achieve up to 2X of performance improvement to NVRAM-based persistent memory employing write-ahead logging. In addition, our design has numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and no redundant writes to slow non-volatile storage media. The work was published in MICRO 2013 and received Best Paper Honorable Mentioned Award.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Hewlett Packard Laboratories, Palo Alto, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

DOE Contract Number:: SC0005026

OSTI ID:: 1164421

Report Number(s):: DOE-HPL-5026-1

Country of Publication:: United States

Language:: English

Similar Records

EqualChance: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches

Conference · Wed Jan 01 00:00:00 EST 2014 · OSTI ID:1164421

Mittal, Sparsh; Vetter, Jeffrey S

LastingNVCache: A Technique for Improving the Lifetime of Non-volatile Caches

Conference · Wed Jan 01 00:00:00 EST 2014 · OSTI ID:1164421

Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong

DESTINY

Software · Tue Mar 10 00:00:00 EDT 2015 · OSTI ID:1164421

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

Citation Formats

Similar Records

Related Subjects