On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

D'Azevedo, Ed F; Nintcheu Fata, Sylvain

doi:10.1016/j.enganabound.2012.02.014

On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

Journal Article · Sun Jan 01 04:00:00 EST 2012 · Engineering Analysis with Boundary Elements

DOI:https://doi.org/10.1016/j.enganabound.2012.02.014· OSTI ID:1041419

D'Azevedo, Ed F ^[1]; Nintcheu Fata, Sylvain ^[1]

ORNL

A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL); Center for Computational Sciences

Sponsoring Organization:: DOE Office of Science; SC USDOE - Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1041419

Journal Information:: Engineering Analysis with Boundary Elements, Journal Name: Engineering Analysis with Boundary Elements Journal Issue: 8 Vol. 36; ISSN 0955-7997

Country of Publication:: United States

Language:: English

Similar Records

A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

Journal Article · Sun Sep 29 20:00:00 EDT 2024 · International Journal of High Performance Computing Applications · OSTI ID:2499469

An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

Journal Article · Sun Jan 04 19:00:00 EST 2015 · Computer Physics Communications · OSTI ID:1185465

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems

Journal Article · Sun Aug 18 20:00:00 EDT 2019 · Journal of Parallel and Distributed Computing · OSTI ID:1559632

Related Subjects

99 GENERAL AND MISCELLANEOUS
ALGORITHMS
FACTORIZATION
IMPLEMENTATION
LAPLACE EQUATION
PERFORMANCE
PROCESSING
numerical linear algebra

On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

Citation Formats

Similar Records

Related Subjects