A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures
- Lawrence Livermore National Laboratory (LLNL)
- Georgia Institute of Technology
- ORNL
- University of Texas, Austin
- New York University
We present new scalable algorithms and an implementation of the kernel-independent fast multiple method (KIFMM), employing hybrid distributed memory message passing (via MPI) and shared memory/streaming using graphics processing unit (GPU) acceleration to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65k cores (AMD/CRAY-based Kraken system at NSF/NICS) on tree data structures with 25 levels between leaves. On GPU-enabled systems, we achieve 30 x speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only implementation. Both of these demonstrations represent the largest and fastest of their kind of which we are aware. We achieve scalability at extreme core counts by extending the initial work of Ying et al. (ACM/IEEE SC 03) with a new approach to scalable MPI-based tree construction and partitioning. For the sub-components of KIFMM, which direct- and approximate-interactions, target evaluation, and source-to-multipole translations, we use CUDA-based GPU-acceleration to achieve excellent performance. To do so requires carefully constructed data structure transformations, which we describe, and whose cost we show is minor. Taken together, these components show promise for ultrascalable FMM in the petascale era and beyond.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). National Center for Computational Sciences (NCCS)
- Sponsoring Organization:
- USDOE Office of Nuclear Energy (NE)
- DOE Contract Number:
- DE-AC05-00OR22725
- OSTI ID:
- 1033545
- Resource Relation:
- Conference: ACM/IEEE Supercomputing, Portland, OR, USA, 20091114, 20091114
- Country of Publication:
- United States
- Language:
- English
Similar Records
Quantum Monte Carlo Endstation for Petascale Computing
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: XSEDE '12 Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, Article No. 4