skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Abstract

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.

Authors:
 [1];  [2];  [2];  [2];  [1];  [2];  [3];  [2];  [2];  [2]
  1. Stony Brook Univ., Stony Brook, NY (United States)
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  3. Washington Univ., St. Louis, MO (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1371471
Alternate Identifier(s):
OSTI ID: 1414597
Report Number(s):
SAND-2015-9641J
Journal ID: ISSN 0743-7315; PII: S074373151630185X
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 102; Journal Issue: C; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; two-level memory; high-bandwidth memory; sorting; k-means clustering

Citation Formats

Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, and Rodrigues, Arun. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation. United States: N. p., 2017. Web. doi:10.1016/j.jpdc.2016.12.009.
Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, & Rodrigues, Arun. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation. United States. doi:10.1016/j.jpdc.2016.12.009.
Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, and Rodrigues, Arun. Tue . "Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation". United States. doi:10.1016/j.jpdc.2016.12.009. https://www.osti.gov/servlets/purl/1371471.
@article{osti_1371471,
title = {Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation},
author = {Bender, Michael A. and Berry, Jonathan W. and Hammond, Simon D. and Hemmert, K. Scott and McCauley, Samuel and Moore, Branden and Moseley, Benjamin and Phillips, Cynthia A. and Resnick, David and Rodrigues, Arun},
abstractNote = {A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.},
doi = {10.1016/j.jpdc.2016.12.009},
journal = {Journal of Parallel and Distributed Computing},
issn = {0743-7315},
number = C,
volume = 102,
place = {United States},
year = {2017},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: