Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

An Integrated Performance Visualizer for MPI/OpenMP Programs

Conference ·
Cluster computing has emerged as a defacto standard in parallel computing over the last decade. Now, researchers have begun to use clustered, shared-memory multiprocessors (SMPs) to attack some of the largest and most complex scientific calculations in the world today [2, 1], running them on the world's largest machines including the US DOE ASCI platforms: Red, Blue Mountain, Blue Pacific, and White. MPI has been the predominant programming model for clusters [3]; however, as users move to ''wider'' SMPs, the combination of MPI and threads has a ''natural fit'' to the underlying system design: use MPI for managing parallelism between SMPs and threads for parallelism within one SMP. OpenMP is emerging as a leading contender for managing parallelism within an SMP. OpenMP and MPI offer their users very different characteristics. Developed for different memory models, they fill diametrically opposed needs for parallel programming. OpenMP was made for shared memory systems, while MPI was made for distributed memory systems. OpenMP was designed for explicit parallelism and implicit data movement, while MPI was designed for explicit data movement and implicit parallelism. This difference in focus gives the two parallel programming frameworks very different usage characteristics. But these complementary usage characteristics make the two frameworks perfect for handling the two different parallel environments presented by cluster computing: shared memory within a box and distributed memory between the boxes. Unfortunately, simply writing OpenMP and MPI code does not guarantee efficient use of the underlying cluster hardware. What is more, existing tools only provide performance information about either MPI or OpenMP, but not both. This lack of integration prevents users from understanding the critical path for performance in their application. This integration also helps users adjust their expectations of performance for their application's software design. Once the user decides to investigate their application's performance, they need detailed information about the expense of operations in their application. Most likely, message passing activity and OpenMP regions are related to these expensive operations. Viewed in this light, the user needs a performance analyzer to understand the complex interactions of MPI and OpenMP. For message passing codes, several performance analysis tools exist: Vampir, TimeScan, Paragraph, and others [make citations]. For OpenMP codes there is GuideView and a few other proprietary tools from other vendors [make citations]. However, in practice, there is little production quality support for the combination of MPI and OpenMP.
Research Organization:
Lawrence Livermore National Lab., CA (US)
Sponsoring Organization:
US Department of Energy (US)
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15005336
Report Number(s):
UCRL-JC-142829
Country of Publication:
United States
Language:
English

References (4)

Waiting time analysis and performance visualization in Carnival conference January 1996
Visualizing the performance of parallel programs journal September 1991
Very high resolution simulation of compressible turbulence on the IBM-SP system conference January 1999
High-Performance Reactive Fluid Flow Simulations Using Adaptive Mesh Refinement on Thousands of Processors conference January 2000

Similar Records

Parallel Programming in MCNP6
Technical Report · Mon Sep 22 00:00:00 EDT 2025 · OSTI ID:2589836

Parallel, Distributed Scripting with Python
Conference · Fri May 24 00:00:00 EDT 2002 · OSTI ID:15013331