Parallel performance prediction using lost cycles analysis
- Univ. of Rochester, NY (United States). Dept. of Computer Science
Most performance debugging and tuning of parallel programs is based on the ``measure-modify`` approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying conditions. Analytic modeling and scalability analysis provide predictive power, but are not widely used in practice, due primarily to their emphasis on asymptotic behavior and the difficulty of developing accurate models that work for real-world programs. In this paper the authors describe a set of tools for performance tuning of parallel programs that bridges this gap between measurement and modeling. This approach is based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. The authors first describe a tool for measuring overheads in parallel programs that they have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1. They then describe a tool that fits these overhead measurements to analytic forms. They illustrate the use of these tools by analyzing the performance tradeoffs among parallel implementations of 2D FFT. These examples show how their tools enable programmers to develop accurate performance models of parallel applications without requiring extensive performance modeling expertise.
- OSTI ID:
- 87660
- Report Number(s):
- CONF-941118-; ISBN 0-8186-6605-6; TRN: IM9535%%292
- Resource Relation:
- Conference: Supercomputing `94 meeting, Washington, DC (United States), 14-18 Nov 1994; Other Information: PBD: 1994; Related Information: Is Part Of Supercomputing `94: Proceedings; PB: 849 p.
- Country of Publication:
- United States
- Language:
- English
Similar Records
Implementing a parallel C++ runtime system for scalable parallel systems
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)