Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Emmerald : A Fast MatrixMatrix Multiply Using Intel's SSE Instructions
 

Summary: Emmerald : A Fast Matrix­Matrix Multiply
Using Intel's SSE Instructions
Douglas Aberdeen
Research School of Information Sciences and Engineering
Australian National University
daa@csl.anu.edu.au
Jonathan Baxter
Research School of Information Sciences and Engineering
Australian National University
Jonathan.Baxter@anu.edu.au
August 26, 2000
Abstract
Generalised matrix­matrix multiplication forms the kernel of many mathemat­
ical algorithms, hence a faster matrix­matrix multiply immediately benefits these
algorithms. In this paper we implement efficient matrix multiplication for large
matrices using the Intel Pentium single instruction multiple data (SIMD) floating
point architecture. The main difficulty with the Pentium and other commodity pro­
cessors is the need to efficiently utilize the cache hierarchy, particularly given the
growing gap between main­memory and CPU clock speeds. We give a detailed
description of the register allocation, Level 1 and Level 2 cache blocking strategies

  

Source: Aberdeen, Douglas - National ICT Australia & Computer Sciences Laboratory, Australian National University

 

Collections: Computer Technologies and Information Sciences