On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm
Journal Article
·
· Engineering Analysis with Boundary Elements
- ORNL
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL); Center for Computational Sciences
- Sponsoring Organization:
- DOE Office of Science; SC USDOE - Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1041419
- Journal Information:
- Engineering Analysis with Boundary Elements, Journal Name: Engineering Analysis with Boundary Elements Journal Issue: 8 Vol. 36; ISSN 0955-7997
- Country of Publication:
- United States
- Language:
- English
Similar Records
A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
Journal Article
·
Sun Sep 29 20:00:00 EDT 2024
· International Journal of High Performance Computing Applications
·
OSTI ID:2499469
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU
Journal Article
·
Sun Jan 04 19:00:00 EST 2015
· Computer Physics Communications
·
OSTI ID:1185465
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
Journal Article
·
Sun Aug 18 20:00:00 EDT 2019
· Journal of Parallel and Distributed Computing
·
OSTI ID:1559632