skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High-bandwidth prefetcher for high-bandwidth memory

Abstract

A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer ("ORB"). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.

Inventors:
; ; ; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1435638
Patent Number(s):
9,946,654
Application Number:
15/335,041
Assignee:
Cray Inc. (Seattle, WA)
DOE Contract Number:  
AC52-07NA27344; B609229
Resource Type:
Patent
Resource Relation:
Patent File Date: 2016 Oct 26
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Mehta, Sanyam, Kohn, James Robert, Ernst, Daniel Jonathan, Poxon, Heidi Lynn, and DeRose, Luiz. High-bandwidth prefetcher for high-bandwidth memory. United States: N. p., 2018. Web.
Mehta, Sanyam, Kohn, James Robert, Ernst, Daniel Jonathan, Poxon, Heidi Lynn, & DeRose, Luiz. High-bandwidth prefetcher for high-bandwidth memory. United States.
Mehta, Sanyam, Kohn, James Robert, Ernst, Daniel Jonathan, Poxon, Heidi Lynn, and DeRose, Luiz. Tue . "High-bandwidth prefetcher for high-bandwidth memory". United States. https://www.osti.gov/servlets/purl/1435638.
@article{osti_1435638,
title = {High-bandwidth prefetcher for high-bandwidth memory},
author = {Mehta, Sanyam and Kohn, James Robert and Ernst, Daniel Jonathan and Poxon, Heidi Lynn and DeRose, Luiz},
abstractNote = {A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer ("ORB"). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.},
doi = {},
url = {https://www.osti.gov/biblio/1435638}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {4}
}

Patent:

Save / Share:

Works referenced in this record:

Continual flow pipelines
conference, January 2004

  • Srinivasan, Srikanth T.; Rajwar, Ravi; Akkary, Haitham
  • Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI
  • https://doi.org/10.1145/1024393.1024407

Guided region prefetching: a cooperative hardware/software approach
conference, January 2003


Full-system analysis and characterization of interactive smartphone applications
conference, November 2011


Multi-stage coordinated prefetching for present-day processors
conference, January 2014


Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking
journal, January 2015


DCT image denoising: a simple and effective image denoising algorithm
journal, January 2011


The load slice core microarchitecture
conference, January 2015


Tile size selection revisited
journal, December 2013


TABLA: A unified template-based framework for accelerating statistical machine learning
conference, March 2016


A compiler-directed data prefetching scheme for chip multiprocessors
conference, January 2008

  • Son, Seung Woo; Kandemir, Mahmut; Karakoy, Mustafa
  • Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09
  • https://doi.org/10.1145/1504176.1504208

WearCore: A Core for Wearable Workloads
conference, September 2016

  • Mehta, Sanyam; Torrellas, Josep
  • PACT '16: International Conference on Parallel Architectures and Compilation, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
  • https://doi.org/10.1145/2967938.2967956

A study of mobile device utilization
conference, March 2015


TurboTiling: Leveraging Prefetching to Boost Performance of Tiled Codes
conference, January 2016


Runahead execution: an alternative to very large instruction windows for out-of-order processors
conference, January 2003

  • Mutlu, O.; Stark, J.; Wilkerson, C.
  • Ninth International Symposium on High-Performance Computer-Architecture. HPCA-9 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.
  • https://doi.org/10.1109/HPCA.2003.1183532

Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench
conference, September 2013


Design and evaluation of a compiler algorithm for prefetching
conference, September 1992

  • Mowry, Todd C.; Lam, Monica S.; Gupta, Anoop
  • ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, Vol. 27, Issue 9, p. 62-73
  • https://doi.org/10.1145/143365.143488

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers
conference, January 2015


The gem5 simulator
journal, August 2011


Design and evaluation of compiler algorithms for pre-execution
conference, January 2002

  • Kim, Dongkeun; Yeung, Donald
  • Tenth international conference on architectural support for programming languages and operating systems on Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X) - ASPLOS '02
  • https://doi.org/10.1145/605397.605415

DeepFace: Closing the Gap to Human-Level Performance in Face Verification
conference, June 2014


CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques
conference, November 2011


Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications
journal, March 2014


Software prefetching
conference, January 1991

  • Callahan, David; Kennedy, Ken; Porterfield, Allan
  • Proceedings of the fourth international conference on Architectural support for programming languages and operating systems - ASPLOS-IV
  • https://doi.org/10.1145/106972.106979

Inter-core prefetching for multicore processors using migrating helper threads
conference, January 2011

  • Kamruzzaman, Md; Swanson, Steven; Tullsen, Dean M.
  • Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11
  • https://doi.org/10.1145/1950365.1950411