skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.3499· OSTI ID:1227395
 [1];  [2];  [3];  [1];  [1]
  1. Iowa State Univ., Ames, IA (United States)
  2. Michigan State Univ., East Lansing, MI (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  3. Iowa State Univ., Ames, IA (United States); Old Dominion Univ., Norfolk, VA (United States)

In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important features of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.

Research Organization:
Ames Lab., Ames, IA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
0941434; 0904782; 1047772; AC02-07CH11358; SC0008485; FG02-87ER40371; AC02-05CH11231
OSTI ID:
1227395
Report Number(s):
IS-J-8845
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 27, Issue 17; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science