Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Collective Memory Transfers for Multi-Core Chips

Technical Report ·
DOI:https://doi.org/10.2172/1164908· OSTI ID:1164908

Future performance improvements for microprocessors have shifted from clock frequency scaling towards increases in on-chip parallelism. Performance improvements for a wide variety of parallel applications require domain-decomposition of data arrays from a contiguous arrangement in memory to a tiled layout for on-chip L1 data caches and scratchpads. How- ever, DRAM performance suffers under the non-streaming access patterns generated by many independent cores. We propose collective memory scheduling (CMS) that actively takes control of collective memory transfers such that requests arrive in a sequential and predictable fashion to the memory controller. CMS uses the hierarchically tiled arrays formal- ism to compactly express collective operations, which greatly improves programmability over conventional prefetch or list- DMA approaches. CMS reduces application execution time by up to 32% and DRAM read power by 2.2×, compared to a baseline DMA architecture such as STI Cell.

Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1164908
Report Number(s):
LBNL-6485E
Country of Publication:
United States
Language:
English

Similar Records

The SPUR instruction unit: An on-chip instruction cache memory for a high performance VLSI multiprocessor
Book · Wed Dec 31 23:00:00 EST 1986 · OSTI ID:5384602

The design and analysis of a high performance single chip processor
Thesis/Dissertation · Sat Dec 31 23:00:00 EST 1988 · OSTI ID:5897698

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture
Conference · Sun Jan 11 23:00:00 EST 2009 · OSTI ID:963537