skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An Application Specific Memory Characterization Technique for Co-processor Accelerators

Abstract

Commodity accelerator technologies including reconfigurable devices and graphical processing units (GPUs) provide an order of magnitude performance improvement compared to mainstream microprocessor systems. A number of compute-intensive, scientific applications, therefore, can potentially benefit from commodity computing devices available in the form of co-processor accelerators. However, there has been little progress in accelerating production-level scientific applications using these technologies due to several programming and performance challenges. One of the key performance challenges is performance sustainability. While computation is often accelerated substantially by accelerator devices, the achievable performance is significantly lower once the data transfer costs and overheads are incorporated. We present an application-specific memory characterization technique for an FPGA-accelerated system that enabled us to reduce data transfer overhead for a scientific application by a factor of 5. We classify large data structures in the application according to their read and write characteristics and access patterns. This classification in turn enabled us to sustain a speedup of over three for a full-scale scientific application. Our proposed technique extends to applications that exhibit similar memory behavior and to co-processor accelerator systems that support data streaming and pipelining, and allow overlapped execution between the host and the accelerator device.

Authors:
 [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
931800
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE 18th International Conference on Application-specific Systems, Architectures and Processors, Montreal, Canada, 20070709, 20070711
Country of Publication:
United States
Language:
English
Subject:
97; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; COMPUTER NETWORKS; PERFORMANCE; PROGRAMMING; MEMORY MANAGEMENT; DATA TRANSMISSION; TIME DEPENDENCE

Citation Formats

Alam, Sadaf R, Smith, Melissa C, and Vetter, Jeffrey S. An Application Specific Memory Characterization Technique for Co-processor Accelerators. United States: N. p., 2007. Web.
Alam, Sadaf R, Smith, Melissa C, & Vetter, Jeffrey S. An Application Specific Memory Characterization Technique for Co-processor Accelerators. United States.
Alam, Sadaf R, Smith, Melissa C, and Vetter, Jeffrey S. Mon . "An Application Specific Memory Characterization Technique for Co-processor Accelerators". United States. doi:.
@article{osti_931800,
title = {An Application Specific Memory Characterization Technique for Co-processor Accelerators},
author = {Alam, Sadaf R and Smith, Melissa C and Vetter, Jeffrey S},
abstractNote = {Commodity accelerator technologies including reconfigurable devices and graphical processing units (GPUs) provide an order of magnitude performance improvement compared to mainstream microprocessor systems. A number of compute-intensive, scientific applications, therefore, can potentially benefit from commodity computing devices available in the form of co-processor accelerators. However, there has been little progress in accelerating production-level scientific applications using these technologies due to several programming and performance challenges. One of the key performance challenges is performance sustainability. While computation is often accelerated substantially by accelerator devices, the achievable performance is significantly lower once the data transfer costs and overheads are incorporated. We present an application-specific memory characterization technique for an FPGA-accelerated system that enabled us to reduce data transfer overhead for a scientific application by a factor of 5. We classify large data structures in the application according to their read and write characteristics and access patterns. This classification in turn enabled us to sustain a speedup of over three for a full-scale scientific application. Our proposed technique extends to applications that exhibit similar memory behavior and to co-processor accelerator systems that support data streaming and pipelining, and allow overlapped execution between the host and the accelerator device.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: