Memory Efficient Parallel Matrix Multiplication Operation for Irregular Problems
Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applications, matrix distribution is based on the underlying physical problem and might involve variable block sizes on individual processors. This paper describes a generalization of the Shared and Remote-memory based Universal Matrix Multiplication Algorithm (SRUMMA) [1] to handle irregularly distributed matrices. Our approach relies on a distribution independent algorithm that provides dynamic load balancing by exploiting data locality and achieves performance as good as the traditional approach which relies on temporary arrays with regular distribution, data redistribution, and matrix multiplication for regular matrices to handle the irregular case. The proposed algorithm is memory-efficient because temporary matrices are not needed. This feature is critical for systems like the IBM Blue Gene/L that offer very limited amount of memory per node. The experimental results demonstrate very good performance across the range matrix distributions and problem sizes motivated by real applications.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 887376
- Report Number(s):
- PNNL-SA-47061; KP1704020; TRN: US200618%%22
- Resource Relation:
- Conference: Proceedings of the 3rd Conference on Computing Frontiers CF '06, Ischia, Italy May 03 - 05, 2006, 229-240
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallel sparse matrix computations in iterative solvers on distributed memory machines
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)