skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Active non-volatile memory post-processing


A computing node includes an active Non-Volatile Random Access Memory (NVRAM) component which includes memory and a sub-processor component. The memory is to store data chunks received from a processor core, the data chunks comprising metadata indicating a type of post-processing to be performed on data within the data chunks. The sub-processor component is to perform post-processing of said data chunks based on said metadata.

; ;
Publication Date:
Research Org.:
Hewlett Packard Enterprise Development LP, Houston, TX (United States)
Sponsoring Org.:
OSTI Identifier:
Patent Number(s):
Application Number:
Hewlett Packard Enterprise Development LP CHO
DOE Contract Number:
Resource Type:
Resource Relation:
Patent File Date: 2012 Feb 24
Country of Publication:
United States

Citation Formats

Kannan, Sudarsun, Milojicic, Dejan S., and Talwar, Vanish. Active non-volatile memory post-processing. United States: N. p., 2017. Web.
Kannan, Sudarsun, Milojicic, Dejan S., & Talwar, Vanish. Active non-volatile memory post-processing. United States.
Kannan, Sudarsun, Milojicic, Dejan S., and Talwar, Vanish. Tue . "Active non-volatile memory post-processing". United States. doi:.
title = {Active non-volatile memory post-processing},
author = {Kannan, Sudarsun and Milojicic, Dejan S. and Talwar, Vanish},
abstractNote = {A computing node includes an active Non-Volatile Random Access Memory (NVRAM) component which includes memory and a sub-processor component. The memory is to store data chunks received from a processor core, the data chunks comprising metadata indicating a type of post-processing to be performed on data within the data chunks. The sub-processor component is to perform post-processing of said data chunks based on said metadata.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Apr 11 00:00:00 EDT 2017},
month = {Tue Apr 11 00:00:00 EDT 2017}


Save / Share:
  • In this abstract, we study the performance and energy tradeoffs involved in migrating data analysis into the flash device, a process we refer to as Active Flash. The Active Flash paradigm is similar to 'active disks', which has received considerable attention. Active Flash allows us to move processing closer to data, thereby minimizing data movement costs and reducing power consumption. It enables true out-of-core computation. The conventional definition of out-of-core solvers refers to an approach to process data that is too large to fit in the main memory and, consequently, requires access to disk. However, in Active Flash, processing outsidemore » the host CPU literally frees the core and achieves real 'out-of-core' analysis. Moving analysis to data has long been desirable, not just at this level, but at all levels of the system hierarchy. However, this requires a detailed study on the tradeoffs involved in achieving analysis turnaround under an acceptable energy envelope. To this end, we first need to evaluate if there is enough computing power on the flash device to warrant such an exploration. Flash processors require decent computing power to run the internal logic pertaining to the Flash Translation Layer (FTL), which is responsible for operations such as address translation, garbage collection (GC) and wear-leveling. Modern SSDs are composed of multiple packages and several flash chips within a package. The packages are connected using multiple I/O channels to offer high I/O bandwidth. SSD computing power is also expected to be high enough to exploit such inherent internal parallelism within the drive to increase the bandwidth and to handle fast I/O requests. More recently, SSD devices are being equipped with powerful processing units and are even embedded with multicore CPUs (e.g. ARM Cortex-A9 embedded processor is advertised to reach 2GHz frequency and deliver 5000 DMIPS; OCZ RevoDrive X2 SSD has 4 SandForce controllers, each with 780MHz max frequency Tensilica core). Efforts that take advantage of the available computing cycles on the processors on SSDs to run auxiliary tasks other than actual I/O requests are beginning to emerge. Kim et al. investigate database scan operations in the context of processing on the SSDs, and propose dedicated hardware logic to speed up scans. Also, cluster architectures have been explored, which consist of low-power embedded CPUs coupled with small local flash to achieve fast, parallel access to data. Processor utilization on SSD is highly dependent on workloads and, therefore, they can be idle during periods with no I/O accesses. We propose to use the available processing capability on the SSD to run tasks that can be offloaded from the host. This paper makes the following contributions: (1) We have investigated Active Flash and its potential to optimize the total energy cost, including power consumption on the host and the flash device; (2) We have developed analytical models to analyze the performance-energy tradeoffs for Active Flash, by treating the SSD as a blackbox, this is particularly valuable due to the proprietary nature of the SSD internal hardware; and (3) We have enhanced a well-known SSD simulator (from MSR) to implement 'on-the-fly' data compression using Active Flash. Our results provide a window into striking a balance between energy consumption and application performance.« less
  • A non-volatile memory and a method of refreshing a memory are described. The method includes allowing an external system to control refreshing operations within the memory. The memory may generate a refresh request signal and transmit the refresh request signal to the external system. When the external system finds an available time to process the refresh request, the external system acknowledges the refresh request and transmits a refresh acknowledge signal to the memory. The memory may also comprise a page register for reading and rewriting a data state back to the memory. The page register may comprise latches in lieumore » of supplemental non-volatile storage elements, thereby conserving real estate within the memory.« less
  • Methods, apparatus and articles of manufacture to secure non-volatile memory regions are disclosed. An example method disclosed herein comprises associating a first key pair and a second key pair different than the first key pair with a process, using the first key pair to secure a first region of a non-volatile memory for the process, and using the second key pair to secure a second region of the non-volatile memory for the same process, the second region being different than the first region.
  • A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, themore » non-volatile memory is a pluggable flash memory card.« less
  • Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with themore » endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.« less