| | |
Summary: LAPACK Working Note 111, UTK, http://www.netlib.org/lapack/lawns
Optimizing Matrix Multiply using PHiPAC: a Portable,
High-Performance, ANSI C Coding Methodology
Je Bilmes
, Krste Asanovicy
, Jim Demmelz
, Dominic Lamx
, Chee-Whye Chin {
August 8, 1996
Abstract
BLAS3 operations have great potential for aggressive optimization. Unfortunately, they
usually need to be hand-coded for a specic machine and compiler to achieve near-peak per-
formance. We have developed a methodology whereby near-peak performance on a wide range
of systems can be achieved automatically for such routines. First, by analyzing current ma-
chines and C compilers, we've developed guidelines for writing Portable, High-Performance,
ANSI C (PHiPAC, pronounced \fee-pack"). Second, rather than code by hand, we produce
parameterized code generators. Third, we write search scripts that nd the best parameters
for a given system. We report on a BLAS GEMM compatible multi-level cache-blocked matrix
multiply generator that produces code achieving performance in excess of 90% of peak on the
Sparcstation-20/61, IBM RS/6000-590, HP 712/80i, and 80% of peak on the SGI Indigo R4k.
|