Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers
Abstract
On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership classmore »
- Authors:
-
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1567437
- Resource Type:
- Conference
- Resource Relation:
- Conference: ICPE '16 Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Computer Science
Citation Formats
Sreepathi, Sarat, D'Azevedo, Ed, Philip, Bobby, and Worley, Patrick. Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers. United States: N. p., 2016.
Web. doi:10.1145/2851553.2851575.
Sreepathi, Sarat, D'Azevedo, Ed, Philip, Bobby, & Worley, Patrick. Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers. United States. https://doi.org/10.1145/2851553.2851575
Sreepathi, Sarat, D'Azevedo, Ed, Philip, Bobby, and Worley, Patrick. Fri .
"Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers". United States. https://doi.org/10.1145/2851553.2851575.
@article{osti_1567437,
title = {Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers},
author = {Sreepathi, Sarat and D'Azevedo, Ed and Philip, Bobby and Worley, Patrick},
abstractNote = {On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.},
doi = {10.1145/2851553.2851575},
url = {https://www.osti.gov/biblio/1567437},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {1}
}
Works referenced in this record:
Optimizing task layout on the Blue Gene/L supercomputer
journal, March 2005
- Bhanot, G.; Gara, A.; Heidelberger, P.
- IBM Journal of Research and Development, Vol. 49, Issue 2.3
Reducing the bandwidth of sparse symmetric matrices
conference, January 1969
- Cuthill, E.; McKee, J.
- Proceedings of the 1969 24th national conference on -
Task allocation onto a hypercube by recursive mincut bipartitioning
conference, January 1988
- Ercal, F.; Ramanujam, J.; Sadayappan, P.
- Proceedings of the third conference on Hypercube concurrent computers and applications Architecture, software, computer systems, and general issues -
Nested Dissection of a Regular Finite Element Mesh
journal, April 1973
- George, Alan
- SIAM Journal on Numerical Analysis, Vol. 10, Issue 2
Violin Plots: A Box Plot-Density Trace Synergism
journal, May 1998
- Hintze, Jerry L.; Nelson, Ray D.
- The American Statistician, Vol. 52, Issue 2
Generic topology mapping strategies for large-scale parallel architectures
conference, January 2011
- Hoefler, Torsten; Snir, Marc
- Proceedings of the international conference on Supercomputing - ICS '11
Hierarchical clustering schemes
journal, September 1967
- Johnson, Stephen C.
- Psychometrika, Vol. 32, Issue 3
Asynchronous Fast Adaptive Composite-Grid Methods: Numerical Results
journal, January 2003
- Lee, Barry; McCormick, Stephen F.; Philip, Bobby
- SIAM Journal on Scientific Computing, Vol. 25, Issue 2
Asynchronous Fast Adaptive Composite-Grid Methods for Elliptic Problems: Theoretical Foundations
journal, January 2004
- Lee, Barry; McCormick, Stephen F.; Philip, Bobby
- SIAM Journal on Numerical Analysis, Vol. 42, Issue 1
Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors: Performance results
journal, November 1989
- McCormick, S.; Quinlan, D.
- Parallel Computing, Vol. 12, Issue 2
The fast adaptive composite grid (FAC) method for elliptic equations
journal, May 1986
- McCormick, S.; Thomas, J.
- Mathematics of Computation, Vol. 46, Issue 174
Dynamic implicit 3D adaptive mesh refinement for non-equilibrium radiation diffusion
journal, April 2014
- Philip, B.; Wang, Z.; Berrill, M. A.
- Journal of Computational Physics, Vol. 262
Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications: Task Reordering to Improve Parallel Performance
journal, April 2015
- Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
- Concurrency and Computation: Practice and Experience, Vol. 27, Issue 17
Parallel static and dynamic multi-constraint graph partitioning
journal, January 2002
- Schloegel, Kirk; Karypis, George; Kumar, Vipin
- Concurrency and Computation: Practice and Experience, Vol. 14, Issue 3
Improving communication performance in dense linear algebra via topology aware collectives
conference, January 2011
- Solomonik, Edgar; Bhatele, Abhinav; Demmel, James
- Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
Application Characterization Using Oxbow Toolkit and PADS Infrastructure
conference, November 2014
- Sreepathi, Sarat; Grodowitz, M. L.; Lim, Robert
- 2014 Hardware-Software Co-Design for High Performance Computing (Co-HPC)