Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation
Abstract
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.
- Authors:
-
- Stony Brook Univ., Stony Brook, NY (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Washington Univ., St. Louis, MO (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1371471
- Alternate Identifier(s):
- OSTI ID: 1414597
- Report Number(s):
- SAND-2015-9641J
Journal ID: ISSN 0743-7315; PII: S074373151630185X
- Grant/Contract Number:
- AC04-94AL85000
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Parallel and Distributed Computing
- Additional Journal Information:
- Journal Volume: 102; Journal Issue: C; Journal ID: ISSN 0743-7315
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; two-level memory; high-bandwidth memory; sorting; k-means clustering
Citation Formats
Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, and Rodrigues, Arun. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation. United States: N. p., 2017.
Web. doi:10.1016/j.jpdc.2016.12.009.
Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, & Rodrigues, Arun. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation. United States. https://doi.org/10.1016/j.jpdc.2016.12.009
Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, and Rodrigues, Arun. Tue .
"Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation". United States. https://doi.org/10.1016/j.jpdc.2016.12.009. https://www.osti.gov/servlets/purl/1371471.
@article{osti_1371471,
title = {Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation},
author = {Bender, Michael A. and Berry, Jonathan W. and Hammond, Simon D. and Hemmert, K. Scott and McCauley, Samuel and Moore, Branden and Moseley, Benjamin and Phillips, Cynthia A. and Resnick, David and Rodrigues, Arun},
abstractNote = {A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.},
doi = {10.1016/j.jpdc.2016.12.009},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 102,
place = {United States},
year = {Tue Jan 03 00:00:00 EST 2017},
month = {Tue Jan 03 00:00:00 EST 2017}
}
Web of Science