skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing

Abstract

This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, this design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-levelmore » programming strategy for performance portability across heterogeneous HPC architectures.« less

Authors:
 [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1261388
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, USA, 20160523, 20160527
Country of Publication:
United States
Language:
English

Citation Formats

Lee, Seyong, Kim, Jungwon, and Vetter, Jeffrey S. OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing. United States: N. p., 2016. Web. doi:10.1109/IPDPS.2016.28.
Lee, Seyong, Kim, Jungwon, & Vetter, Jeffrey S. OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing. United States. doi:10.1109/IPDPS.2016.28.
Lee, Seyong, Kim, Jungwon, and Vetter, Jeffrey S. 2016. "OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing". United States. doi:10.1109/IPDPS.2016.28.
@article{osti_1261388,
title = {OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing},
author = {Lee, Seyong and Kim, Jungwon and Vetter, Jeffrey S},
abstractNote = {This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, this design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.},
doi = {10.1109/IPDPS.2016.28},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 1
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research framework to address those issues in the directive-based accelerator programming. This paper explains important design strategies and key compiler transformation techniques needed to implement the reference OpenACC compiler. Moreover, this paper demonstrates the efficacy of OpenARC as a research framework for directive-based programming study, by proposing andmore » implementing OpenACC extensions in the OpenARC framework to 1) support hybrid programming of the unified memory and separate memory and 2) exploit architecture-specific features in an abstract manner. Porting thirteen standard OpenACC programs and three extended OpenACC programs to CUDA GPUs shows that OpenARC performs similarly to a commercial OpenACC compiler, while it serves as a high-level research framework.« less
  • Reconfigurable computing (RC) is being investigated as a hardware solution for improving time-to-solution for biomolecular simulations. A number of popular molecular dynamics (MD) codes are used to study various aspects of biomolecules. These codes are now capable of simulating nanosecond time-scale trajectories per day on conventional microprocessor-based hardware, but biomolecular processes often occur at the microsecond time-scale or longer. A wide gap exists between the desired and achievable simulation capability; therefore, there is considerable interest in alternative algorithms and hardware for improving the time-to-solution of MD codes. The fine-grain parallelism provided by Field Programmable Gate Arrays (FPGA) combined with theirmore » low power consumption make them an attractive solution for improving the performance of MD simulations. In this work, we use an FPGA-based coprocessor to accelerate the compute-intensive calculations of LAMMPS, a popular MD code, achieving up to 5.5 fold speed-up on the non-bonded force computations of the particle mesh Ewald method and up to 2.2 fold speed-up in overall time-to-solution, and potentially an increase by a factor of 9 in power-performance efficiencies for the pair-wise computations. The results presented here provide an example of the multi-faceted benefits to an application in a heterogeneous computing environment.« less
  • Current high performance computing (HPC) applications are found in many consumer, industrial and research fields. From web searches to auto crash simulations to weather predictions, these applications require large amounts of power by the compute farms and supercomputers required to run them. The demand for more and faster computation continues to increase along with an even sharper increase in the cost of the power required to operate and cool these installations. The ability of standard processor based systems to address these needs has declined in both speed of computation and in power consumption over the past few years. This papermore » presents a new method of computation based upon programmable logic as represented by Field Programmable Gate Arrays (FPGAs) that addresses these needs in a manner requiring only minimal changes to the current software design environment.« less
  • In this study, we use a combination of modeling techniques to describe the relationship between fracture radius that might be accomplished in a hypothetical enhanced geothermal system (EGS) and drilling distance required to create and access those fractures. We use a combination of commonly applied analytical solutions for heat transport in parallel fractures and 3D finite-element method models of more realistic heat extraction geometries. For a conceptual model involving multiple parallel fractures developed perpendicular to an inclined or horizontal borehole, calculations demonstrate that EGS will likely require very large fractures, of greater than 300 m radius, to keep interfracture drillingmore » distances to ~10 km or less. As drilling distances are generally inversely proportional to the square of fracture radius, drilling costs quickly escalate as the fracture radius decreases. It is important to know, however, whether fracture spacing will be dictated by thermal or mechanical considerations, as the relationship between drilling distance and number of fractures is quite different in each case. Information about the likelihood of hydraulically creating very large fractures comes primarily from petroleum recovery industry data describing hydraulic fractures in shale. Those data suggest that fractures with radii on the order of several hundred meters may, indeed, be possible. The results of this study demonstrate that relatively simple calculations can be used to estimate primary design constraints on a system, particularly regarding the relationship between generated fracture radius and the total length of drilling needed in the fracture creation zone. Comparison of the numerical simulations of more realistic geometries than addressed in the analytical solutions suggest that simple proportionalities can readily be derived to relate a particular flow field.« less