skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications.

Abstract

Abstract not provided.

Authors:
; ;
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1264013
Report Number(s):
SAND2006-3015C
525717
DOE Contract Number:
AC04-94AL85000
Resource Type:
Conference
Resource Relation:
Conference: Proposed for presentation at the SC 2006 held November 11-17, 2006 in Ft. Lauderdale, FL.
Country of Publication:
United States
Language:
English

Citation Formats

Underwood, Keith D, Ulmer, Craig D., and Hemmert, Karl Scott. Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications.. United States: N. p., 2006. Web.
Underwood, Keith D, Ulmer, Craig D., & Hemmert, Karl Scott. Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications.. United States.
Underwood, Keith D, Ulmer, Craig D., and Hemmert, Karl Scott. Mon . "Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications.". United States. doi:. https://www.osti.gov/servlets/purl/1264013.
@article{osti_1264013,
title = {Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications.},
author = {Underwood, Keith D and Ulmer, Craig D. and Hemmert, Karl Scott},
abstractNote = {Abstract not provided.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon May 01 00:00:00 EDT 2006},
month = {Mon May 01 00:00:00 EDT 2006}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • On the forefront of recent HPC innovations are Field Programmable Gate Arrays (FPGA), which promise to accelerate calculations by one or more orders of magnitude. The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs, were evaluated using a computational biological human genome comparisons program. This paper describes scalable, parallel, FPGA-accelerated results for the FASTA application ssearch34, using the Smith-Waterman algorithm for DNA, RNA and protein sequencing contained in the OpenFPGA benchmark suite. Results indicate typical Cray XD1 FPGA speedups of 50x (Virtex-II Pro 50) and 100x (Virtex-4 LX160) compared to a 2.2 GHz Opteron.more » Similar speedups are expected for the DRC RPU110-L200 modules (Virtex-4 LX200), which fit in an Opteron socket, and selected by Cray for its XT Supercomputers. The FPGA programming challenges, human genome benchmarking, and data verification of results, are discussed.« less
  • The Cray T3D, an MIMD system with NUMA shared memory capabilities and in principle very low communications latency, can support the Canopy framework for grid-oriented applications. CANOPY has been ported to the T3D, with the intent of making it available to a spectrum of users. The performance of the T3D running Canopy has been benchmarked on five QCD applications extensively run on ACPMAPS at Fermilab, requiring a variety of data access patterns. The net performance and scaling behavior reveals an efficiency relative to peak Gflops almost identical to that achieved on ACPMAPS. Detailed studies of the major factors impacting performancemore » are presented. Generalizations applying this analysis to the newly emerging crop of commercial systems reveal where their limitations will lie. On these applications, efficiencies of above 25% are not to be expected; eliminating overheads due to Canopy will improve matters, but by less than a factor of two.« less
  • The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms onmore » 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor.« less