A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures

Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; Nguyen, Tuan-Anh; Sampath, Rahul S; Shringarpure, Aashay; Vuduc, Richard; Ying, Lexing; Zorin, Denis; Biros, George

Title: A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures

Conference · Thu Jan 01 00:00:00 EST 2009

OSTI ID:1033545

Lashuk, Ilya ^[1]; Chandramowlishwaran, Aparna ^[2]; Langston, Harper ^[2]; Nguyen, Tuan-Anh ^[2]; Sampath, Rahul S ^[3]; Shringarpure, Aashay ^[2]; Vuduc, Richard ^[2]; Ying, Lexing ^[4]; Zorin, Denis ^[5]; Biros, George ^[4]

Lawrence Livermore National Laboratory (LLNL)
Georgia Institute of Technology
ORNL
University of Texas, Austin
New York University

We present new scalable algorithms and an implementation of the kernel-independent fast multiple method (KIFMM), employing hybrid distributed memory message passing (via MPI) and shared memory/streaming using graphics processing unit (GPU) acceleration to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65k cores (AMD/CRAY-based Kraken system at NSF/NICS) on tree data structures with 25 levels between leaves. On GPU-enabled systems, we achieve 30 x speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only implementation. Both of these demonstrations represent the largest and fastest of their kind of which we are aware. We achieve scalability at extreme core counts by extending the initial work of Ying et al. (ACM/IEEE SC 03) with a new approach to scalable MPI-based tree construction and partitioning. For the sub-components of KIFMM, which direct- and approximate-interactions, target evaluation, and source-to-multipole translations, we use CUDA-based GPU-acceleration to achieve excellent performance. To do so requires carefully constructed data structure transformations, which we describe, and whose cost we show is minor. Taken together, these components show promise for ultrascalable FMM in the petascale era and beyond.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). National Center for Computational Sciences (NCCS)

Sponsoring Organization:: USDOE Office of Nuclear Energy (NE)

DOE Contract Number:: DE-AC05-00OR22725

OSTI ID:: 1033545

Resource Relation:: Conference: ACM/IEEE Supercomputing, Portland, OR, USA, 20091114, 20091114

Country of Publication:: United States

Language:: English

Similar Records

A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures

Journal Article · Sun Jan 01 00:00:00 EST 2012 · Communications of the ACM · OSTI ID:1033545

Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; +7 more

Quantum Monte Carlo Endstation for Petascale Computing

Technical Report · Wed Mar 02 00:00:00 EST 2011 · OSTI ID:1033545

Ceperley, David

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: XSEDE '12 Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, Article No. 4

Conference · Sun Jan 01 00:00:00 EST 2012 · OSTI ID:1033545

Humphrey, Alan; Meng, Qingyu; Berzins, Martin; +1 more

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ALGORITHMS
COMPUTER ARCHITECTURE
COMPUTER-GRAPHICS DEVICES
DATA PROCESSING
EVALUATION
IMPLEMENTATION
KERNELS
MEMORY MANAGEMENT
PERFORMANCE

Title: A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures

Citation Formats

Similar Records

Related Subjects