 
Summary: CacheOblivious Algorithms
EXTENDED ABSTRACT
Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran
MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139
fathena,cel,prokop,sridharg@supertech.lcs.mit.edu
Abstract This paper presents asymptotically optimal algo
rithms for rectangular matrix transpose, FFT, and sorting on
computers with multiple levels of caching. Unlike previous
optimal algorithms, these algorithms are cache oblivious: no
variables dependent on hardware parameters, such as cache
size and cacheline length, need to be tuned to achieve opti
mality. Nevertheless, these algorithms use an optimal amount
of work and move data optimally among multiple levels of
cache. For a cache with size Z and cacheline length L where
Z = L2 the number of cache misses for an m n ma
trix transpose is 1 + mn=L. The number of cache misses
for either an npoint FFT or the sorting of n numbers is
1 + n=L1 + logZ n. We also give an mnpwork al
gorithm to multiply an m n matrix by an n p matrix that
incurs 1+ mn+ np+ mp=L + mnp=L
