skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures

Conference ·
OSTI ID:1033545

We present new scalable algorithms and an implementation of the kernel-independent fast multiple method (KIFMM), employing hybrid distributed memory message passing (via MPI) and shared memory/streaming using graphics processing unit (GPU) acceleration to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65k cores (AMD/CRAY-based Kraken system at NSF/NICS) on tree data structures with 25 levels between leaves. On GPU-enabled systems, we achieve 30 x speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only implementation. Both of these demonstrations represent the largest and fastest of their kind of which we are aware. We achieve scalability at extreme core counts by extending the initial work of Ying et al. (ACM/IEEE SC 03) with a new approach to scalable MPI-based tree construction and partitioning. For the sub-components of KIFMM, which direct- and approximate-interactions, target evaluation, and source-to-multipole translations, we use CUDA-based GPU-acceleration to achieve excellent performance. To do so requires carefully constructed data structure transformations, which we describe, and whose cost we show is minor. Taken together, these components show promise for ultrascalable FMM in the petascale era and beyond.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). National Center for Computational Sciences (NCCS)
Sponsoring Organization:
USDOE Office of Nuclear Energy (NE)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1033545
Resource Relation:
Conference: ACM/IEEE Supercomputing, Portland, OR, USA, 20091114, 20091114
Country of Publication:
United States
Language:
English