An Integrated Performance Visualizer for MPI/OpenMP Programs

Hoeflinger, J; Kuhn, B; Petersen, P; Rajic, H; Shah, S; Vetter, J; Voss, M; Woo, R

doi:10.1007/3-540-44587-0_5

An Integrated Performance Visualizer for MPI/OpenMP Programs

Conference · Sun Feb 25 04:00:00 EST 2001

DOI:https://doi.org/10.1007/3-540-44587-0_5· OSTI ID:15005336

Hoeflinger, J; Kuhn, B; Petersen, P; Rajic, H; Shah, S; Vetter, J; Voss, M; Woo, R

Cluster computing has emerged as a defacto standard in parallel computing over the last decade. Now, researchers have begun to use clustered, shared-memory multiprocessors (SMPs) to attack some of the largest and most complex scientific calculations in the world today [2, 1], running them on the world's largest machines including the US DOE ASCI platforms: Red, Blue Mountain, Blue Pacific, and White. MPI has been the predominant programming model for clusters [3]; however, as users move to ''wider'' SMPs, the combination of MPI and threads has a ''natural fit'' to the underlying system design: use MPI for managing parallelism between SMPs and threads for parallelism within one SMP. OpenMP is emerging as a leading contender for managing parallelism within an SMP. OpenMP and MPI offer their users very different characteristics. Developed for different memory models, they fill diametrically opposed needs for parallel programming. OpenMP was made for shared memory systems, while MPI was made for distributed memory systems. OpenMP was designed for explicit parallelism and implicit data movement, while MPI was designed for explicit data movement and implicit parallelism. This difference in focus gives the two parallel programming frameworks very different usage characteristics. But these complementary usage characteristics make the two frameworks perfect for handling the two different parallel environments presented by cluster computing: shared memory within a box and distributed memory between the boxes. Unfortunately, simply writing OpenMP and MPI code does not guarantee efficient use of the underlying cluster hardware. What is more, existing tools only provide performance information about either MPI or OpenMP, but not both. This lack of integration prevents users from understanding the critical path for performance in their application. This integration also helps users adjust their expectations of performance for their application's software design. Once the user decides to investigate their application's performance, they need detailed information about the expense of operations in their application. Most likely, message passing activity and OpenMP regions are related to these expensive operations. Viewed in this light, the user needs a performance analyzer to understand the complex interactions of MPI and OpenMP. For message passing codes, several performance analysis tools exist: Vampir, TimeScan, Paragraph, and others [make citations]. For OpenMP codes there is GuideView and a few other proprietary tools from other vendors [make citations]. However, in practice, there is little production quality support for the combination of MPI and OpenMP.

Research Organization:: Lawrence Livermore National Lab., CA (US)

Sponsoring Organization:: US Department of Energy (US)

DOE Contract Number:: W-7405-ENG-48

OSTI ID:: 15005336

Report Number(s):: UCRL-JC-142829

Country of Publication:: United States

Language:: English

References (4)

Waiting time analysis and performance visualization in Carnival Meira, Wagner; LeBlanc, Thomas J.; Poulos, Alexandros Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96 https://doi.org/10.1145/238020.238023	conference	January 1996
Visualizing the performance of parallel programs Heath, M. T.; Etheridge, J. A. IEEE Software, Vol. 8, Issue 5 https://doi.org/10.1109/52.84214	journal	September 1991
Very high resolution simulation of compressible turbulence on the IBM-SP system Mirin, A. A.; Porter, D. H.; Woodward, P. R. Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99 https://doi.org/10.1145/331532.331601	conference	January 1999
High-Performance Reactive Fluid Flow Simulations Using Adaptive Mesh Refinement on Thousands of Processors Calder, A. C.; Curts, B. C.; Dursi, L. J. ACM/IEEE SC 2000 Conference (SC'00) https://doi.org/10.1109/SC.2000.10010	conference	January 2000

Similar Records

Parallel Programming in MCNP6

Technical Report · Mon Sep 22 00:00:00 EDT 2025 · OSTI ID:2589836

Parallel, Distributed Scripting with Python

Conference · Fri May 24 00:00:00 EDT 2002 · OSTI ID:15013331

Related Subjects

99 GENERAL AND MISCELLANEOUS
ARRAY PROCESSORS
DESIGN
PERFORMANCE
PRODUCTION
PROGRAMMING
US DOE

An Integrated Performance Visualizer for MPI/OpenMP Programs

Citation Formats

References (4)

Similar Records

Related Subjects