| | |
Summary: Emmerald : A Fast MatrixMatrix Multiply
Using Intel's SSE Instructions
Douglas Aberdeen
Research School of Information Sciences and Engineering
Australian National University
daa@csl.anu.edu.au
Jonathan Baxter
Research School of Information Sciences and Engineering
Australian National University
Jonathan.Baxter@anu.edu.au
August 26, 2000
Abstract
Generalised matrixmatrix multiplication forms the kernel of many mathemat
ical algorithms, hence a faster matrixmatrix multiply immediately benefits these
algorithms. In this paper we implement efficient matrix multiplication for large
matrices using the Intel Pentium single instruction multiple data (SIMD) floating
point architecture. The main difficulty with the Pentium and other commodity pro
cessors is the need to efficiently utilize the cache hierarchy, particularly given the
growing gap between mainmemory and CPU clock speeds. We give a detailed
description of the register allocation, Level 1 and Level 2 cache blocking strategies
|