On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm
- ORNL
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL); Center for Computational Sciences
- Sponsoring Organization:
- DOE Office of Science; SC USDOE - Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1041419
- Journal Information:
- Engineering Analysis with Boundary Elements, Journal Name: Engineering Analysis with Boundary Elements Journal Issue: 8 Vol. 36; ISSN 0955-7997
- Country of Publication:
- United States
- Language:
- English
Similar Records
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems