Evaluating the potential of disaggregated memory systems for HPC applications

Ding, Nan; Maris, Pieter; Nam, Hai Ah; Groves, Taylor; Awan, Muaaz Gul; Lindsey, LeAnn; Daley, Christopher; Selvitopi, Oguz; Oliker, Leonid; Wright, Nicholas; Williams, Samuel

doi:10.1002/cpe.8147

Evaluating the potential of disaggregated memory systems for HPC applications

Journal Article · Fri May 31 00:00:00 EDT 2024 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.8147· OSTI ID:2369149

^[1]; Maris, Pieter ^[2]; Nam, Hai Ah ^[3]; ^[3]; Awan, Muaaz Gul ^[3]; Lindsey, LeAnn ^[3]; Daley, Christopher ^[3]; Selvitopi, Oguz ^[1]; Oliker, Leonid ^[1]; Wright, Nicholas ^[3]; Williams, Samuel ^[1]

Computational Research Division Lawrence Berkeley National Laboratory Berkeley California USA
Department of Physics and Astronomy Iowa State University Ames Iowa USA
National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Berkeley California USA

Summary

Disaggregated memory is a promising approach that addresses the limitations of traditional memory architectures by enabling memory to be decoupled from compute nodes and shared across a data center. Cloud platforms have deployed such systems to improve overall system memory utilization, but performance can vary across workloads. High‐performance computing (HPC) is crucial in scientific and engineering applications, where HPC machines also face the issue of underutilized memory. As a result, improving system memory utilization while understanding workload performance is essential for HPC operators. Therefore, learning the potential of a disaggregated memory system before deployment is a critical step. This paper proposes a methodology for exploring the design space of a disaggregated memory system. It incorporates key metrics that affect performance on disaggregated memory systems: memory capacity, local and remote memory access ratio, injection bandwidth, and bisection bandwidth, providing an intuitive approach to guide machine configurations based on technology trends and workload characteristics. We apply our methodology to analyze thirteen diverse workloads, including AI training, data analysis, genomics, protein, fusion, atomic nuclei, and traditional HPC bookends. Our methodology demonstrates the ability to comprehend the potential and pitfalls of a disaggregated memory system and provides motivation for machine configurations. Our results show that eleven of our thirteen applications can leverage injection bandwidth disaggregated memory without affecting performance, while one pays a rack bisection bandwidth penalty and two pay the system‐wide bisection bandwidth penalty. In addition, we also show that intra‐rack memory disaggregation would meet the application's memory requirement and provide enough remote memory bandwidth.

View Accepted Manuscript (Publisher)

Sponsoring Organization:: USDOE

Grant/Contract Number:: AC02-05CH11231; SC0023495

OSTI ID:: 2369149

Journal Information:: Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience; ISSN 1532-0626

Publisher:: Wiley Blackwell (John Wiley & Sons)Copyright Statement

Country of Publication:: United Kingdom

Language:: English

References (35)

Nonlinear magnetohydrodynamics simulation using high-order finite elements Sovinec, C. R.; Glasser, A. H.; Gianakon, T. A. Journal of Computational Physics, Vol. 195, Issue 1 https://doi.org/10.1016/j.jcp.2003.10.004	journal	March 2004
Accelerating an iterative eigensolver for nuclear structure configuration interaction calculations on GPUs using OpenACC Maris, Pieter; Yang, Chao; Oryspayev, Dossay Journal of Computational Science, Vol. 59 https://doi.org/10.1016/j.jocs.2021.101554	journal	March 2022
Terabase-scale metagenome coassembly with MetaHipMer Hofmeyr, Steven; Egan, Rob; Georganas, Evangelos Scientific Reports, Vol. 10, Issue 1 https://doi.org/10.1038/s41598-020-67416-5	journal	July 2020
The M3D- C ¹ approach to simulating 3D 2-fluid magnetohydrodynamics in magnetic fusion experiments Jardin, S. C.; Ferraro, N.; Luo, X. Journal of Physics: Conference Series, Vol. 125 https://doi.org/10.1088/1742-6596/125/1/012044	journal	July 2008
Deep Residual Learning for Image Recognition He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.90	conference	June 2016
Abstract - HOTI 2019: Compute Express Link Van Doren, S. 2019 IEEE Symposium on High-Performance Interconnects (HOTI) https://doi.org/10.1109/HOTI.2019.00017	conference	August 2019
DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection Dong, Bin; Tribaldos, Veronica Rodriguez; Xing, Xin 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS47924.2020.00035	conference	May 2020
Nvidia Hopper GPU and Grace CPU Highlights Elster, Anne C.; Haugdahl, Tor A. Computing in Science & Engineering, Vol. 24, Issue 2 https://doi.org/10.1109/MCSE.2022.3163817	journal	March 2022
AMD Fusion APU: Llano Branover, Alexander; Foley, Denis; Steinman, Maurice IEEE Micro, Vol. 32, Issue 2 https://doi.org/10.1109/MM.2012.2	journal	March 2012
Novel Composable and Scaleout Architectures Using Compute Express Link Sharma, Debendra Das IEEE Micro, Vol. 43, Issue 2 https://doi.org/10.1109/MM.2023.3235972	journal	March 2023
Architectural Requirements for Deep Learning Workloads in HPC Environments Ibrahim, Khaled Z.; Nguyen, Tan; Nam, Hai Ah 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS54543.2021.00007	conference	November 2021
Methodology for Evaluating the Potential of Disaggregated Memory Systems Ding, Nan; Williams, Samuel; Nam, Hai Ah 2022 IEEE/ACM International Workshop on Resource Disaggregation in High-Performance Computing (REDIS) https://doi.org/10.1109/RESDIS56595.2022.00006	conference	November 2022
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing Daga, Mayank; Aji, Ashwin M.; Feng, Wu-chun 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC) https://doi.org/10.1109/SAAHPC.2011.29	conference	July 2011
On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems Peng, Ivy; Pearce, Roger; Gokhale, Maya 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) https://doi.org/10.1109/SBAC-PAD49847.2020.00034	conference	September 2020
Exascale Deep Learning for Climate Analytics Kurth, Thorsten; Treichler, Sean; Romero, Joshua SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00054	conference	November 2018
CosmoFlow: Using Deep Learning to Learn the Universe at Scale Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00068	conference	November 2018
An In-Depth Analysis of the Slingshot Interconnect De Sensi, Daniele; Di Girolamo, Salvatore; McMahon, Kim H. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC41405.2020.00039	conference	November 2020
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices Selvitopi, Oguz; Ekanayake, Saliya; Guidi, Giulia SC20: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC41405.2020.00079	conference	November 2020
Bisection (Band)Width of Product Networks with Application to Data Centers Aroca, Jordi Arjona; Anta, Antonio Fernandez IEEE Transactions on Parallel and Distributed Systems, Vol. 25, Issue 3 https://doi.org/10.1109/TPDS.2013.95	journal	March 2014
Leveraging One-Sided Communication for Sparse Triangular Solvers Ding, Nan; Williams, Samuel; Liu, Yang Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, p. 93-105 https://doi.org/10.1137/1.9781611976137.9	book	January 2020
A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver Ding, Nan; Liu, Yang; Williams, Samuel SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), p. 147-159 https://doi.org/10.1137/1.9781611976830.14	book	January 2021
Latency lags bandwith Patterson, David A. Communications of the ACM, Vol. 47, Issue 10 https://doi.org/10.1145/1022594.1022596	journal	October 2004
Optimal sparse matrix dense vector multiplication in the I/O-model Bender, Michael A.; Brodal, Gerth Stølting; Fagerberg, Rolf Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures https://doi.org/10.1145/1248377.1248391	conference	June 2007
Roofline: an insightful visual performance model for multicore architectures Williams, Samuel; Waterman, Andrew; Patterson, David Communications of the ACM, Vol. 52, Issue 4 https://doi.org/10.1145/1498765.1498785	journal	April 2009
Disaggregated memory for expansion and sharing in blade servers Lim, Kevin; Chang, Jichuan; Mudge, Trevor ACM SIGARCH Computer Architecture News, Vol. 37, Issue 3 https://doi.org/10.1145/1555815.1555789	journal	June 2009
Brief announcement Grigori, Laura; David, Pierre-Yves; Demmel, James W. Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures https://doi.org/10.1145/1810479.1810496	conference	June 2010
Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight Fu, Haohuan; Liu, Weiguo; Wang, Lanning Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126909	conference	January 2017
Can far memory improve job throughput? Amaro, Emmanuel; Branner-Augmon, Christopher; Luo, Zhihong Proceedings of the Fifteenth European Conference on Computer Systems https://doi.org/10.1145/3342195.3387522	conference	April 2020
Rethinking software runtimes for disaggregated memory Calciu, Irina; Imran, M. Talha; Puddu, Ivan Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3445814.3446713	conference	April 2021
Clio: a hardware-software co-designed disaggregated memory system Guo, Zhiyuan; Shan, Yizhou; Luo, Xuhao Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3503222.3507762	conference	February 2022
A Case For Intra-rack Resource Disaggregation in HPC Michelogiannakis, George; Klenk, Benjamin; Cook, Brandon ACM Transactions on Architecture and Code Optimization, Vol. 19, Issue 2 https://doi.org/10.1145/3514245	journal	June 2022
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms Li, Huaicheng; Berger, Daniel S.; Hsu, Lisa Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 https://doi.org/10.1145/3575693.3578835	conference	January 2023
The high-speed networks of the Summit and Sierra supercomputers Stunkel, C. B.; Graham, R. L.; Shainer, G. IBM Journal of Research and Development, Vol. 64, Issue 3/4 https://doi.org/10.1147/JRD.2020.2967330	journal	May 2020
ADEPT: a domain independent sequence alignment strategy for gpu architectures Awan, Muaaz G.; Deslippe, Jack; Buluc, Aydin BMC Bioinformatics, Vol. 21, Issue 1 https://doi.org/10.1186/s12859-020-03720-1	journal	September 2020
Optically Disaggregated Data Centers With Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation [Invited] Zervas, Georgios; Yuan, Hui; Saljoghei, Arsalan Journal of Optical Communications and Networking, Vol. 10, Issue 2 https://doi.org/10.1364/JOCN.10.00A270	journal	January 2018

Similar Records

A Case For Intra-rack Resource Disaggregation in HPC

Journal Article · Sun Mar 06 19:00:00 EST 2022 · ACM Transactions on Architecture and Code Optimization · OSTI ID:1878112

The Institute for Sustained Performance, Energy, and Resilience

Technical Report · Wed Nov 13 23:00:00 EST 2019 · OSTI ID:1481285

Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems

Technical Report · Tue Aug 21 00:00:00 EDT 2018 · OSTI ID:1467164

Evaluating the potential of disaggregated memory systems for HPC applications

Citation Formats

References (35)

Similar Records

Related Subjects