Summary: Quartz: A Tool for Tuning Parallel Program Performance
Thomas E. Anderson andEdward D. Lazowska
Department of Computer Science and Engineering
University of Washington
Seattle WA 98195
Initial implementations of parallel programs typically yield disappointing performance. Tuning to improve performance is thus a
significant part of the parallel programming process. The effort required to tune a parallel program, and the level of performance that
eventually is achieved, both depend heavily on the quality of the instrumentation that is available to the programmer.
This paper describes Quartz, a new tool for tuning parallel program performance on shared memory multiprocessors. The philosophy
underlying Quartz was inspired by that of the sequential UNIX tool gprof: to appropriately direct the attention of the programmer by
efficiently measuring just those factors that are most responsible for performance and by relating these metrics to one another and to the
structure of the program. This philosophy is even more important in the parallel domain than in the sequential domain, because of the
dramatically greater number of possible metrics and the dramatically increased complexity of program structures.
The principal metric of Quartz is normalized processor time: the total processor time spent in each section of code divided by the number
of other processors that are concurrently busy when that section of code is being executed. Tied to the logical structure of the program, this
metric provides a "smoking gun" pointing towards those areas of the program most responsible for poor performance. This information
can be acquired efficiently by checkpointing to memory the number of busy processors and the state of each processor, and then
statistically sampling these using a dedicated processor.