Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; Maris, Pieter; Vary, James P.

doi:10.1002/cpe.3499

Title: Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

Journal Article · Tue Jul 14 00:00:00 EDT 2015 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.3499· OSTI ID:1227395

Oryspayev, Dossay ^[1]; Aktulga, Hasan Metin ^[2]; Sosonkina, Masha ^[3]; Maris, Pieter ^[1]; Vary, James P. ^[1]

Iowa State Univ., Ames, IA (United States)
Michigan State Univ., East Lansing, MI (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Iowa State Univ., Ames, IA (United States); Old Dominion Univ., Norfolk, VA (United States)

In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important features of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Ames Lab., Ames, IA (United States)

Sponsoring Organization:: USDOE

Grant/Contract Number:: 0941434; 0904782; 1047772; AC02-07CH11358; SC0008485; FG02-87ER40371; AC02-05CH11231

OSTI ID:: 1227395

Report Number(s):: IS-J-8845

Journal Information:: Concurrency and Computation. Practice and Experience, Vol. 27, Issue 17; ISSN 1532-0626

Publisher:: WileyCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 2 works

Citation information provided by
Web of Science

Similar Records

An Asynchronous Task-based Fan-Both Sparse Cholesky Solver

Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1227395

Jacquelin, Mathias; Zheng, Yili; Ng, Esmond; +1 more

Accelerating an iterative eigensolver for nuclear structure configuration interaction calculations on GPUs using OpenACC

Journal Article · Tue Mar 01 00:00:00 EST 2022 · Journal of Computational Science · OSTI ID:1227395

Maris, Pieter; Yang, Chao; Oryspayev, Dossay; +1 more

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

Journal Article · Thu Jun 01 00:00:00 EDT 2017 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1227395

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; +6 more

Related Subjects

97 MATHEMATICS AND COMPUTING
distributed symmetric SpMVM
hybrid MPI/OpenMP parallelism
topology-aware mapping
reduced data movement

Title: Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

Citation Formats

Similar Records

Related Subjects