Measuring FLOPS Using Hardware Performance Counter Technologies on LC systems
FLOPS (FLoating-point Operations Per Second) is a commonly used performance metric for scientific programs that rely heavily on floating-point (FP) calculations. The metric is based on the number of FP operations rather than instructions, thereby facilitating a fair comparison between different machines. A well-known use of this metric is the LINPACK benchmark that is used to generate the Top500 list. It measures how fast a computer solves a dense N by N system of linear equations Ax=b, which requires a known number of FP operations, and reports the result in millions of FP operations per second (MFLOPS). While running a benchmark with known FP workloads can provide insightful information about the efficiency of a machine's FP pipelines in relation to other machines, measuring FLOPS of an arbitrary scientific application in a platform-independent manner is nontrivial. The goal of this paper is twofold. First, we explore the FP microarchitectures of key processors that are underpinning the LC machines. Second, we present the hardware performance monitoring counter-based measurement techniques that a user can use to get the native FLOPS of his or her program, which are practical solutions readily available on LC platforms. By nature, however, these native FLOPS metrics are not directly comparable across different machines mainly because FP operations are not consistent across microarchitectures. Thus, the first goal of this paper represents the base reference by which a user can interpret the measured FLOPS more judiciously.
- Research Organization:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 945513
- Report Number(s):
- LLNL-TR-406864
- Country of Publication:
- United States
- Language:
- English
Similar Records
Developing a tuned version of scaLAPACK's linear equation solver
Machine organization of the IBM RISC System/6000 processor
The implications of working set analysis on supercomputing memory hierarchy design.
Technical Report
·
Sun Oct 29 00:00:00 EDT 2000
·
OSTI ID:15013126
Machine organization of the IBM RISC System/6000 processor
Journal Article
·
Sun Dec 31 23:00:00 EST 1989
· IBM Journal of Research and Development (International Business Machines); (USA)
·
OSTI ID:6764506
The implications of working set analysis on supercomputing memory hierarchy design.
Conference
·
Mon Feb 28 23:00:00 EST 2005
·
OSTI ID:946978