Parallel matrix transpose algorithms on distributed memory concurrent computers
- Tennessee Univ., Knoxville, TN (United States)
- Oak Ridge National Lab., TN (United States)
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P {times} Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A {center_dot} B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A{sup T} {center_dot} B{sup T}, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
- Research Organization:
- Oak Ridge National Lab., TN (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States); Defense Advanced Research Projects Agency, Arlington, VA (United States); Department of the Air Force, Washington, DC (United States)
- DOE Contract Number:
- AC05-84OR21400
- OSTI ID:
- 233300
- Report Number(s):
- CONF-9310220--8; ON: DE96010006; CNN: Contract DAAL03-91-C-0047
- Country of Publication:
- United States
- Language:
- English
Similar Records
PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers
The spectral decomposition of nonsymmetric matrices on distributed memory parallel computers