Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Evaluating the potential of disaggregated memory systems for HPC applications

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.8147· OSTI ID:2369149
 [1];  [2];  [3];  [3];  [3];  [3];  [3];  [1];  [1];  [3];  [1]
  1. Computational Research Division Lawrence Berkeley National Laboratory Berkeley California USA
  2. Department of Physics and Astronomy Iowa State University Ames Iowa USA
  3. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Berkeley California USA
Summary

Disaggregated memory is a promising approach that addresses the limitations of traditional memory architectures by enabling memory to be decoupled from compute nodes and shared across a data center. Cloud platforms have deployed such systems to improve overall system memory utilization, but performance can vary across workloads. High‐performance computing (HPC) is crucial in scientific and engineering applications, where HPC machines also face the issue of underutilized memory. As a result, improving system memory utilization while understanding workload performance is essential for HPC operators. Therefore, learning the potential of a disaggregated memory system before deployment is a critical step. This paper proposes a methodology for exploring the design space of a disaggregated memory system. It incorporates key metrics that affect performance on disaggregated memory systems: memory capacity, local and remote memory access ratio, injection bandwidth, and bisection bandwidth, providing an intuitive approach to guide machine configurations based on technology trends and workload characteristics. We apply our methodology to analyze thirteen diverse workloads, including AI training, data analysis, genomics, protein, fusion, atomic nuclei, and traditional HPC bookends. Our methodology demonstrates the ability to comprehend the potential and pitfalls of a disaggregated memory system and provides motivation for machine configurations. Our results show that eleven of our thirteen applications can leverage injection bandwidth disaggregated memory without affecting performance, while one pays a rack bisection bandwidth penalty and two pay the system‐wide bisection bandwidth penalty. In addition, we also show that intra‐rack memory disaggregation would meet the application's memory requirement and provide enough remote memory bandwidth.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-05CH11231; SC0023495
OSTI ID:
2369149
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience; ISSN 1532-0626
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
United Kingdom
Language:
English

References (35)

Nonlinear magnetohydrodynamics simulation using high-order finite elements journal March 2004
Accelerating an iterative eigensolver for nuclear structure configuration interaction calculations on GPUs using OpenACC journal March 2022
Terabase-scale metagenome coassembly with MetaHipMer journal July 2020
The M3D- C 1 approach to simulating 3D 2-fluid magnetohydrodynamics in magnetic fusion experiments journal July 2008
Deep Residual Learning for Image Recognition conference June 2016
Abstract - HOTI 2019: Compute Express Link conference August 2019
DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection conference May 2020
Nvidia Hopper GPU and Grace CPU Highlights journal March 2022
AMD Fusion APU: Llano journal March 2012
Novel Composable and Scaleout Architectures Using Compute Express Link journal March 2023
Architectural Requirements for Deep Learning Workloads in HPC Environments conference November 2021
Methodology for Evaluating the Potential of Disaggregated Memory Systems conference November 2022
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing conference July 2011
On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems conference September 2020
Exascale Deep Learning for Climate Analytics conference November 2018
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
  • Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00068
conference November 2018
An In-Depth Analysis of the Slingshot Interconnect conference November 2020
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices conference November 2020
Bisection (Band)Width of Product Networks with Application to Data Centers journal March 2014
Leveraging One-Sided Communication for Sparse Triangular Solvers book January 2020
A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver book January 2021
Latency lags bandwith journal October 2004
Optimal sparse matrix dense vector multiplication in the I/O-model conference June 2007
Roofline: an insightful visual performance model for multicore architectures journal April 2009
Disaggregated memory for expansion and sharing in blade servers journal June 2009
Brief announcement
  • Grigori, Laura; David, Pierre-Yves; Demmel, James W.
  • Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures https://doi.org/10.1145/1810479.1810496
conference June 2010
Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight
  • Fu, Haohuan; Liu, Weiguo; Wang, Lanning
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126909
conference January 2017
Can far memory improve job throughput? conference April 2020
Rethinking software runtimes for disaggregated memory
  • Calciu, Irina; Imran, M. Talha; Puddu, Ivan
  • Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3445814.3446713
conference April 2021
Clio: a hardware-software co-designed disaggregated memory system
  • Guo, Zhiyuan; Shan, Yizhou; Luo, Xuhao
  • Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3503222.3507762
conference February 2022
A Case For Intra-rack Resource Disaggregation in HPC
  • Michelogiannakis, George; Klenk, Benjamin; Cook, Brandon
  • ACM Transactions on Architecture and Code Optimization, Vol. 19, Issue 2 https://doi.org/10.1145/3514245
journal June 2022
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
  • Li, Huaicheng; Berger, Daniel S.; Hsu, Lisa
  • Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 https://doi.org/10.1145/3575693.3578835
conference January 2023
The high-speed networks of the Summit and Sierra supercomputers journal May 2020
ADEPT: a domain independent sequence alignment strategy for gpu architectures journal September 2020
Optically Disaggregated Data Centers With Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation [Invited] journal January 2018

Similar Records

A Case For Intra-rack Resource Disaggregation in HPC
Journal Article · Sun Mar 06 19:00:00 EST 2022 · ACM Transactions on Architecture and Code Optimization · OSTI ID:1878112

The Institute for Sustained Performance, Energy, and Resilience
Technical Report · Wed Nov 13 23:00:00 EST 2019 · OSTI ID:1481285

Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems
Technical Report · Tue Aug 21 00:00:00 EDT 2018 · OSTI ID:1467164

Related Subjects