Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

Journal Article · · Engineering Analysis with Boundary Elements

A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL); Center for Computational Sciences
Sponsoring Organization:
DOE Office of Science; SC USDOE - Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1041419
Journal Information:
Engineering Analysis with Boundary Elements, Journal Name: Engineering Analysis with Boundary Elements Journal Issue: 8 Vol. 36; ISSN 0955-7997
Country of Publication:
United States
Language:
English

Similar Records

A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression
Journal Article · Mon Sep 30 00:00:00 EDT 2024 · International Journal of High Performance Computing Applications · OSTI ID:2499469

An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU
Journal Article · Sun Jan 04 23:00:00 EST 2015 · Computer Physics Communications · OSTI ID:1185465

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
Journal Article · Mon Aug 19 00:00:00 EDT 2019 · Journal of Parallel and Distributed Computing · OSTI ID:1559632