Sunder: A programmable hardware prefetch architecture for numerical loops
- State Univ. of New York, Stony Brook, NY (United States). Computer Science Dept.
Beyond data caching, data prefetching is by far the most effective way to address the memory access bottleneck associated with high-performance processors. This is particularly true for scientific programs whose working sets cannot be easily fit into the on-chip data cache. This paper proposes a new data prefetching architecture called Sunder, which combines the flexibility and accurateness of software prefetching and the transparency and low-overhead of hardware prefetching. The heart of the design is a dedicated Prefetch Engine that is programmable at run time by the software. An important design decision is to keep the Prefetch Engine completely isolated from the normal instruction execution pipeline except a loop counter to keep the two synchronized at the boundaries of loop iterations. A detailed simulation study on the Sunder architecture shows that compared to the cache-only architecture, Sunder achieves an average relative performance advantage over cache-only architectures ranging from 28% to 46%, with smaller cache block sizes leading to greater performance improvement.
- OSTI ID:
- 87653
- Report Number(s):
- CONF-941118--; ISBN 0-8186-6605-6
- Country of Publication:
- United States
- Language:
- English
Similar Records
Data prefetching in shared memory multiprocessors
Programmable stream prefetch with resource optimization