skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler

Abstract

Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understand- ing, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage. Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec bench- marks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos.

Authors:
 [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1265553
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: International Conference On Computational Science, Reykjavik, Iceland, 20150601, 20150603
Country of Publication:
United States
Language:
English

Citation Formats

Janjusic, Tommy, and Kartsaklis, Christos. Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler. United States: N. p., 2015. Web.
Janjusic, Tommy, & Kartsaklis, Christos. Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler. United States.
Janjusic, Tommy, and Kartsaklis, Christos. Thu . "Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler". United States. doi:.
@article{osti_1265553,
title = {Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler},
author = {Janjusic, Tommy and Kartsaklis, Christos},
abstractNote = {Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understand- ing, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage. Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec bench- marks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes two optimizing transformations suitable for certain classes of numerical algorithms, one for reducing the cost of inter-processor communication, and one for improving cache utilization; demonstrates and analyzes the resulting performance gains; and indicates howmore » these transformations are being automated.« less
  • High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes a technique for introducing cache blocking suitable for certain classes of numerical algorithms, demonstrates and analyzes the resulting performance gains, and indicates how this optimization transformation is being automated.
  • The authors describe the Control System of the CERN Neutrino Beamline, based on FactoryLink and equipment controllers using PCs running LynxOS. FactoryLink runs on a workstation and connects via communications tasks over Ethernet to both the PCs and the data acquisition systems of the experiments. The PCs access VME and CAMAC directly via the VICbus. Object-oriented control software, entirely data driven, deals with hardware modules and local survey operations. Remote stations have access via X-Window.
  • We describe the design of ScaLAPACK++, an object oriented C++ library for implementing linear algebra computations on distributed memory multicomputers. This package, when complete, will support distributed dense, banded, sparse matrix operations for symmetric, positive-definite, and non-symmetric cases. In ScaLAPACK++ we have employed object oriented design methods to enchance scalability, portability, flexibility, and ease-of-use. We illustrate some of these points by describing the implementation of a right-looking LU factorization for dense systems in ScaLAPACK++.