Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.; Hemmert, K. Scott; McCauley, Samuel; Moore, Branden; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David; Rodrigues, Arun

doi:10.1016/j.jpdc.2016.12.009

Title: Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Full Record
Other Related Research

Abstract

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.

Authors:

Bender, Michael A. ^[1]; Berry, Jonathan W. ^[2]; Hammond, Simon D. ^[2]; Hemmert, K. Scott ^[2]; McCauley, Samuel ^[1]; Moore, Branden ^[2]; Moseley, Benjamin ^[3]; Phillips, Cynthia A. ^[2]; Resnick, David ^[2]; Rodrigues, Arun ^[2]

Stony Brook Univ., Stony Brook, NY (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Washington Univ., St. Louis, MO (United States)

Publication Date:: Tue Jan 03 00:00:00 EST 2017

Research Org.:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Org.:: USDOE National Nuclear Security Administration (NNSA)

OSTI Identifier:: 1371471

Alternate Identifier(s):: OSTI ID: 1414597

Report Number(s):: SAND-2015-9641J
Journal ID: ISSN 0743-7315; PII: S074373151630185X

Grant/Contract Number:: AC04-94AL85000

Resource Type:: Accepted Manuscript

Journal Name:: Journal of Parallel and Distributed Computing

Additional Journal Information:: Journal Volume: 102; Journal Issue: C; Journal ID: ISSN 0743-7315

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; two-level memory; high-bandwidth memory; sorting; k-means clustering

Citation Formats


                    Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, and Rodrigues, Arun. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation.  United States: N. p., 2017. 
Web.  doi:10.1016/j.jpdc.2016.12.009.

Copy to clipboard


                    Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, & Rodrigues, Arun. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation.  United States.  https://doi.org/10.1016/j.jpdc.2016.12.009

Copy to clipboard


                    Bender, Michael A., Berry, Jonathan W., Hammond, Simon D., Hemmert, K. Scott, McCauley, Samuel, Moore, Branden, Moseley, Benjamin, Phillips, Cynthia A., Resnick, David, and Rodrigues, Arun. Tue .  
"Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation".  United States.  https://doi.org/10.1016/j.jpdc.2016.12.009.  https://www.osti.gov/servlets/purl/1371471.

Copy to clipboard


                    
@article{osti_1371471,

  title        = {Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation},

  author       = {Bender, Michael A. and Berry, Jonathan W. and Hammond, Simon D. and Hemmert, K. Scott and McCauley, Samuel and Moore, Branden and Moseley, Benjamin and Phillips, Cynthia A. and Resnick, David and Rodrigues, Arun},

  abstractNote = {A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.},

  doi          = {10.1016/j.jpdc.2016.12.009},

  journal      = {Journal of Parallel and Distributed Computing},

  number       = C,

  volume       = 102,

  place        = {United States},

  year         = {Tue Jan 03 00:00:00 EST 2017},

  month        = {Tue Jan 03 00:00:00 EST 2017}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.jpdc.2016.12.009

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 3 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Similar Records in DOE PAGES and OSTI.GOV collections:

MAC: Memory Access Coalescer for 3D-Stacked Memory

Conference Wang, Xi ; Tumeo, Antonino ; Leidel, John D. ; ...

Emerging data-intensive applications, such as graph analytics and data mining, exhibit irregular memory access patterns. Research has shown that with these memory-bound applications, traditional cache-based processor architectures, which exploit locality and regular patterns to mitigate the memory-wall issue, are inefficient. Meantime, novel 3D-stacked memory devices, such as Hybrid Mem- ory Cube (HMC) and High Bandwidth Memory (HBM), promise significant increases in bandwidth that appear extremely appealing for memory-bound applications. However, conventional memory interfaces designed for cache-based architectures and JEDEC DDR devices fit poorly with the 3D-stacked memory, which leads to significant under-utilization of the promised high bandwidth. As a responsemore »« less
https://doi.org/10.1145/3337821.3337867
HAM: Hotspot-Aware Manager for Improving Communications with 3D-Stacked Memory

Journal Article Wang, Xi ; Tumeo, Antonino ; Leidel, John D. ; ... - IEEE Transactions on Computers

merging High-Performance Computing (HPC) workloads, such as graph analytics, machine learning, and big data science, are data-intensive. Data-intensive workloads usually present fine-grained memory accesses with limited or no data locality, and thus incur frequent cache misses and low utilization of memory bandwidth. 3D-stacked memory devices such as Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) can provide significantly higher bandwidth than conventional memory modules. However, the traditional interfaces and optimization methods for JEDEC DDR devices do not allow to fully exploit the potential performance of 3D-stacked memory with the massive amount of irregular memory accesses of data-intensive applications. Inmore »« less
https://doi.org/10.1109/TC.2021.3066982
PIMS: Memristor-Based Processing-in-Memory-and-Storage.

Technical Report Cook, Jeanine

Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energymore »« less
https://doi.org/10.2172/1424888

Full Text Available
Building more powerful less expensive supercomputers using Processing-In-Memory (PIM) LDRD final report.

Technical Report Murphy, Richard C

This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential ofmore »« less
https://doi.org/10.2172/993898

Full Text Available
Evaluating the Opportunities for Multi-Level Memory - An ASC 2016 L2 Milestone

Technical Report Voskuilen, Gwendolyn Renae ; Frank, Michael P. ; Hammond, Simon David ; ...

As new memory technologies appear on the market, there is a growing push to incorporate them into future architectures. Compared to traditional DDR DRAM, these technologies provide appealing advantages such as increased bandwidth or non-volatility. However, the technologies have significant downsides as well including higher cost, manufacturing complexity, and for non-volatile memories, higher latency and wear-out limitations. As such, no technology has emerged as a clear technological and economic winner. As a result, systems are turning to the concept of multi-level memory, or mixing multiple memory technologies in a single system to balance cost, performance, and reliability.
https://doi.org/10.2172/1562213

Full Text Available

Similar Records