skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Flexible-blocking Based Approach for Performance Tuning of Matrix Multiplication Routines for Large Matrices with Edge Cases

Abstract

Efficient and scalable matrix operations are being highly demanding in the recent era of Machine Learning, Deep Learning, and Big Data Analytics. The two commonly used matrix-matrix operations in the Basic Linear Algebra Subprograms (BLAS) specification are General Matrix-Matrix multiplication (GEMM) and Symmetric Rank-k update (SYRK). The SYRK routine is a specialization of the GEMM routine, where half of the multiplications are skipped as the resultant matrix is known to be symmetric. Fortunately, several linear algebra libraries implement these BLAS routines quite efficiently. The libraries usually partition the input matrices into blocks and place them in processor caches, thus improving performance by leveraging the caches. However, the contemporary libraries are highly optimized for squarish matrices, but the performance degrades significantly for the matrices with edge case (strictly thin or strictly fat shapes) in the multicore machine. The primary reason is that the current state-of-the-art libraries make fixed block shapes based on a processor architecture, and do not consider the shape of the input matrices. In this paper, we propose a new blocking approach, we name it Flexible-blocking, to mitigate the scalability issues. In contrast to the contemporary libraries, our approach formulates the blocks of the input matrices based on themore » shapes of the matrices as well as the number of threads used in the implementation. Our proposed technique shows noticeable performance improvement on multicore shared-memory machines for the edge case matrices.« less

Authors:
 [1];  [1];  [1];  [1]; ORCiD logo [2];  [3]
  1. Tennessee Technological University (TTU)
  2. ORNL
  3. Cray, Inc.
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1557472
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2018 IEEE International Conference on Big Data (Big Data) - Seattle, Washington, United States of America - 12/10/2018 3:00:00 PM-12/13/2018 5:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Hossain, Md Mosharaf, Hines, Thomas M., Ghafoor, Sheikh, Rabiul islam, Sheikh, Kannan, Ramakrishnan, and Sukumar, Sreenivas R. A Flexible-blocking Based Approach for Performance Tuning of Matrix Multiplication Routines for Large Matrices with Edge Cases. United States: N. p., 2018. Web. doi:10.1109/BigData.2018.8622013.
Hossain, Md Mosharaf, Hines, Thomas M., Ghafoor, Sheikh, Rabiul islam, Sheikh, Kannan, Ramakrishnan, & Sukumar, Sreenivas R. A Flexible-blocking Based Approach for Performance Tuning of Matrix Multiplication Routines for Large Matrices with Edge Cases. United States. https://doi.org/10.1109/BigData.2018.8622013
Hossain, Md Mosharaf, Hines, Thomas M., Ghafoor, Sheikh, Rabiul islam, Sheikh, Kannan, Ramakrishnan, and Sukumar, Sreenivas R. 2018. "A Flexible-blocking Based Approach for Performance Tuning of Matrix Multiplication Routines for Large Matrices with Edge Cases". United States. https://doi.org/10.1109/BigData.2018.8622013. https://www.osti.gov/servlets/purl/1557472.
@article{osti_1557472,
title = {A Flexible-blocking Based Approach for Performance Tuning of Matrix Multiplication Routines for Large Matrices with Edge Cases},
author = {Hossain, Md Mosharaf and Hines, Thomas M. and Ghafoor, Sheikh and Rabiul islam, Sheikh and Kannan, Ramakrishnan and Sukumar, Sreenivas R.},
abstractNote = {Efficient and scalable matrix operations are being highly demanding in the recent era of Machine Learning, Deep Learning, and Big Data Analytics. The two commonly used matrix-matrix operations in the Basic Linear Algebra Subprograms (BLAS) specification are General Matrix-Matrix multiplication (GEMM) and Symmetric Rank-k update (SYRK). The SYRK routine is a specialization of the GEMM routine, where half of the multiplications are skipped as the resultant matrix is known to be symmetric. Fortunately, several linear algebra libraries implement these BLAS routines quite efficiently. The libraries usually partition the input matrices into blocks and place them in processor caches, thus improving performance by leveraging the caches. However, the contemporary libraries are highly optimized for squarish matrices, but the performance degrades significantly for the matrices with edge case (strictly thin or strictly fat shapes) in the multicore machine. The primary reason is that the current state-of-the-art libraries make fixed block shapes based on a processor architecture, and do not consider the shape of the input matrices. In this paper, we propose a new blocking approach, we name it Flexible-blocking, to mitigate the scalability issues. In contrast to the contemporary libraries, our approach formulates the blocks of the input matrices based on the shapes of the matrices as well as the number of threads used in the implementation. Our proposed technique shows noticeable performance improvement on multicore shared-memory machines for the edge case matrices.},
doi = {10.1109/BigData.2018.8622013},
url = {https://www.osti.gov/biblio/1557472}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {12}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: