A performance model for the communication in fast multipole methods on high-performance computing platforms

Ibeid, Huda; Yokota, Rio; Keyes, David

doi:10.1177/1094342016634819

Title: A performance model for the communication in fast multipole methods on high-performance computing platforms

Journal Article · Wed Jul 27 00:00:00 EDT 2016 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/1094342016634819· OSTI ID:1565623

Ibeid, Huda ^[1]; Yokota, Rio ^[1]; Keyes, David ^[1]

Division of Computer, Electrical and Mathematical Sciences and Engineering King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Argonne National Lab. (ANL), Argonne, IL (United States); UT-Battelle LLC/ORNL, Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC02-06CH11357; AC05-00OR22725

OSTI ID:: 1565623

Journal Information:: International Journal of High Performance Computing Applications, Vol. 30, Issue 4; ISSN 1094-3420

Publisher:: SAGE

Country of Publication:: United States

Language:: English

References (22)

A hierarchical O(N log N) force-calculation algorithm Barnes, Josh; Hut, Piet Nature, Vol. 324, Issue 6096 https://doi.org/10.1038/324446a0	journal	December 1986
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method Chandramowlishwarany, Aparna; Madduri, Kamesh; Vuduc, Richard 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.19	conference	November 2010
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures Chandramowlishwaran, Aparna; Williams, Samuel; Oliker, Leonid 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) https://doi.org/10.1109/IPDPS.2010.5470415	conference	April 2010
A Fast Adaptive Multipole Algorithm in Three Dimensions Cheng, H.; Greengard, L.; Rokhlin, V. Journal of Computational Physics, Vol. 155, Issue 2 https://doi.org/10.1006/jcph.1999.6355	journal	November 1999
Symbolic performance prediction of scalable parallel programs Clement, M. J.; Quinn, M. J. Proceedings of 9th International Parallel Processing Symposium https://doi.org/10.1109/IPPS.1995.395881	conference	January 1995
Guest Editors Introduction to the top 10 algorithms Dongarra, J.; Sullivan, F. Computing in Science & Engineering, Vol. 2, Issue 1 https://doi.org/10.1109/MCISE.2000.814652	journal	January 2000
Parallel Algorithms for the Spectral Transform Method Foster, Ian T.; Worley, Patrick H. SIAM Journal on Scientific Computing, Vol. 18, Issue 3 https://doi.org/10.1137/S1064827594266891	journal	May 1997
Modeling the performance of an algebraic multigrid cycle on HPC platforms Gahvari, Hormozd; Baker, Allison H.; Schulz, Martin Proceedings of the international conference on Supercomputing - ICS '11 https://doi.org/10.1145/1995896.1995924	conference	January 2011
Adaptation and performance of the fast multipole method for dipolar systems Gorn, N. L.; Berkov, D. V. Journal of Magnetism and Magnetic Materials, Vol. 272-276 https://doi.org/10.1016/j.jmmm.2003.11.254	journal	May 2004
Integral Equation Methods for Stokes Flow and Isotropic Elasticity in the Plane Greengard, Leslie; Kropinski, Mary Catherine; Mayo, Anita Journal of Computational Physics, Vol. 125, Issue 2 https://doi.org/10.1006/jcph.1996.0102	journal	May 1996
A fast algorithm for particle simulations Greengard, L.; Rokhlin, V. Journal of Computational Physics, Vol. 73, Issue 2 https://doi.org/10.1016/0021-9991(87)90140-9	journal	December 1987
Scaling Hierarchical N-body Simulations on GPU Clusters Jetley, Pritish; Wesolowski, Lukasz; Gioachin, Filippo 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.49	conference	November 2010
Predictive performance and scalability modeling of a large-scale application Kerbyson, D. J.; Alme, H. J.; Hoisie, A. Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01 https://doi.org/10.1145/582034.582071	conference	January 2001
A massively parallel adaptive fast-multipole method on heterogeneous architectures Lashuk, Ilya; Biros, George; Chandramowlishwaran, Aparna Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09 https://doi.org/10.1145/1654059.1654118	conference	January 2009
Petascale direct numerical simulation of turbulent channel flow on up to 786K cores Lee, Myoungkyu; Malaya, Nicholas; Moser, Robert D. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503298	conference	January 2013
Integrated compilation and scalability analysis for parallel systems Mendes, C. L.; Reed, D. A. Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192) https://doi.org/10.1109/PACT.1998.727287	conference	January 1998
On the scaling of multipole methods for particle–particle interactions Pérez-Jordá, José M.; Yang, Weitao Chemical Physics Letters, Vol. 282, Issue 1 https://doi.org/10.1016/S0009-2614(97)01153-6	journal	January 1998
Modeling application performance by convolving machine signatures with application profiles Snavely, A.; Wolter, N.; Carrington, L. Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538) https://doi.org/10.1109/WWC.2001.990754	conference	January 2001
Application of the fast multipole method for the evaluation of magnetostatic fields in micromagnetic computations Van de Wiele, B.; Olyslager, F.; Dupré, L. Journal of Computational Physics, Vol. 227, Issue 23 https://doi.org/10.1016/j.jcp.2008.08.003	journal	December 2008
Aeroacoustic Integrals Accelerated by Fast Multipole Method Wolf, William R.; Lele, Sanjiva K. AIAA Journal, Vol. 49, Issue 7 https://doi.org/10.2514/1.J050861	journal	July 2011
Performance evaluation of the IBM SP and the Compaq AlphaServer SC Worley, Patrick H. Proceedings of the 14th international conference on Supercomputing - ICS '00 https://doi.org/10.1145/335231.335254	conference	January 2000
Three-dimensional multilevel fast multipole algorithm from static to electrodynamic Zhao, Jun-Sheng; Chew, Weng Cho Microwave and Optical Technology Letters, Vol. 26, Issue 1 https://doi.org/10.1002/(SICI)1098-2760(20000705)26:1<43::AID-MOP14>3.0.CO;2-8	journal	July 2000

Similar Records

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1565623

Shen, Xipeng

Highly scalable distributed-memory sparse triangular solution algorithms.

Conference · Mon Jan 01 00:00:00 EST 2018 · OSTI ID:1565623

Liu, Yang; Jacquelin, Mathias; Ghysels, Pieter; +1 more

PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing

Technical Report · Thu Jun 27 00:00:00 EDT 2019 · OSTI ID:1565623

Wolf, Felix; Lorenz, Daniel

Related Subjects

Computer Science

Title: A performance model for the communication in fast multipole methods on high-performance computing platforms

Citation Formats

References (22)

Similar Records

Related Subjects