Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Instruction-Level Characterization of Scientific Computing Applications Using Hardware Performance Counters

Conference ·
OSTI ID:760038
 [1];
  1. Los Alamos National Laboratory

Workload characterization has been proven an essential tool to architecture design and performance evaluation in both scientific and commercial computing areas. Traditional workload characterization techniques include FLOPS rate, cache miss ratios, CPI (cycles per instruction or IPC, instructions per cycle) etc. With the complexity of sophisticated modern superscalar microprocessors, these traditional characterization techniques are not powerful enough to pinpoint the performance bottleneck of an application on a specific microprocessor. They are also incapable of immediately demonstrating the potential performance benefit of any architectural or functional improvement in a new processor design. To solve these problems, many people rely on simulators, which have substantial constraints especially on large-scale scientific computing applications. This paper presents a new technique of characterizing applications at the instruction level using hardware performance counters. It has the advantage of collecting instruction-level characteristics in a few runs virtually without overhead or slowdown. A variety of instruction counts can be utilized to calculate some average abstract workload parameters corresponding to microprocessor pipelines or functional units. Based on the microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. In particular, the analysis results can provide some insight to the problem that only a small percentage of processor peak performance can be achieved even for many very cache-friendly codes. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. Eventually, these abstract parameters can lead to the creation of an analytical microprocessor pipeline model and memory hierarchy model.

Research Organization:
Los Alamos National Lab., NM (US)
Sponsoring Organization:
USDOE Office of Defense Programs (DP) (US)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
760038
Report Number(s):
LA-UR-98-4179
Country of Publication:
United States
Language:
English

Similar Records

Instruction-level performance modeling and characterization of multimedia applications
Conference · Tue Jun 01 00:00:00 EDT 1999 · OSTI ID:350949

The IBM RISC System/6000 processor; Hardware overview
Journal Article · Sun Dec 31 23:00:00 EST 1989 · IBM Journal of Research and Development (International Business Machines); (USA) · OSTI ID:7035844

Performance characterization and validation of ASCI applications: A memory centric view
Conference · Wed Oct 01 00:00:00 EDT 1997 · OSTI ID:532536