skip to main content

DOE PAGESDOE PAGES

Title: High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. We examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. In conclusion, a critical finding is that hash-table-based SpGEMM gets a significant performancemore » boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Authors:
 [1] ;  [2] ;  [3] ;  [3]
  1. Tokyo Institute of Technology, Tokyo (Japan)
  2. RIKEN Center for Computational Science, Kobe (Japan)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Grant/Contract Number:
AC02-05CH11231
Type:
Accepted Manuscript
Journal Name:
arXiv.org Repository
Additional Journal Information:
Journal Volume: 2018; Related Information: https://arxiv.org/abs/1804.01698; Journal ID: ISSN 9999-0017
Publisher:
Cornell University
Research Org:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
Country of Publication:
United States
Language:
English
OSTI Identifier:
1454499

Nagasaka, Yusuke, Matsuoka, Satoshi, Azad, Ariful, and Buluç, Aydın. High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures. United States: N. p., Web. doi:10.1145/3229710.3229720.
Nagasaka, Yusuke, Matsuoka, Satoshi, Azad, Ariful, & Buluç, Aydın. High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures. United States. doi:10.1145/3229710.3229720.
Nagasaka, Yusuke, Matsuoka, Satoshi, Azad, Ariful, and Buluç, Aydın. 2018. "High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures". United States. doi:10.1145/3229710.3229720. https://www.osti.gov/servlets/purl/1454499.
@article{osti_1454499,
title = {High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures},
author = {Nagasaka, Yusuke and Matsuoka, Satoshi and Azad, Ariful and Buluç, Aydın},
abstractNote = {Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. We examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. In conclusion, a critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.},
doi = {10.1145/3229710.3229720},
journal = {arXiv.org Repository},
number = ,
volume = 2018,
place = {United States},
year = {2018},
month = {1}
}