| | |
Summary: Portable Parallel Implementation of BLAS3
appeared in Concurrency Practice and Experience, Vol. 6, No. 5, pp. 411459, 1994
Dganit Amitai
Amir Averbuch
Ronen Friedman
Eran Gabber
Department of Computer Science
School of Mathematical Sciences
Sackler Faculty of Exact Sciences
TelAviv University
TelAviv 69978, Israel
Abstract
Multiprocessors systems offer fast enough capabilities for handling large numerical
tasks, such as reallife linear algebra problems. Yet, rogramming such systems has
proven itself to be awkward, error prone and architecture specific.
One successful method for alleviating this problem, a method that worked well in the
case of the massively pipelined supercomputers, is to use subprograms libraries. Those
libraries are built to efficiently perform some basic operations, while hiding lowlevel
system specifics from the programmer. Efficiently porting a library to a new hardware,
be it a vector machine or a sharedmemory or messagepassing based multiprocessor,
|