skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Prediction Toolkit

Abstract

The Performance Prediction Toolkit (PPT), is a scalable co-design tool that contains the hardware and middle-ware models, which accept proxy applications as input in runtime prediction. PPT relies on Simian, a parallel discrete event simulation engine in Python or Lua, that uses the process concept, where each computing unit (host, node, core) is a Simian entity. Processes perform their task through message exchanges to remain active, sleep, wake-up, begin and end. The PPT hardware model of a compute core (such as a Haswell core) consists of a set of parameters, such as clock speed, memory hierarchy levels, their respective sizes, cache-lines, access times for different cache levels, average cycle counts of ALU operations, etc. These parameters are ideally read off a spec sheet or are learned using regression models learned from hardware counters (PAPI) data. The compute core model offers an API to the software model, a function called time_compute(), which takes as input a tasklist. A tasklist is an unordered set of ALU, and other CPU-type operations (in particular virtual memory loads and stores). The PPT application model mimics the loop structure of the application and replaces the computational kernels with a call to the hardware model's time_compute() functionmore » giving tasklists as input that model the compute kernel. A PPT application model thus consists of tasklists representing kernels and the high-er level loop structure that we like to think of as pseudo code. The key challenge for the hardware model's time_compute-function is to translate virtual memory accesses into actual cache hierarchy level hits and misses.PPT also contains another CPU core level hardware model, Analytical Memory Model (AMM). The AMM solves this challenge soundly, where our previous alternatives explicitly include the L1,L2,L3 hit-rates as inputs to the tasklists. Explicit hit-rates inevitably only reflect the application modeler's best guess, perhaps informed by a few small test problems using hardware counters; also, hard-coded hit-rates make the hardware model insensitive to changes in cache sizes. Alternatively, we use reuse distance distributions in the tasklists. In general, reuse profiles require the application modeler to run a very expensive trace analysis on the real code that realistically can be done at best for small examples.« less

Authors:
 [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [2];  [2];  [2]
  1. LANL
  2. Florida International University
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
Los Alamos National Laboratory (LANL)
OSTI Identifier:
1401959
Report Number(s):
PPT; 005497MLTPL00
C17098
DOE Contract Number:  
AC52-06NA25396
Resource Type:
Software
Software Revision:
00
Software Package Number:
005497
Software CPU:
MLTPL
Open Source:
Yes
Open source under the BSD license.
Source Code Available:
Yes
Related Software:
SIMIAN PDES
Country of Publication:
United States

Citation Formats

Chennupati, Gopinath, Santhi, Nanadakishore, Eidenbenz, Stephen, Zerr, Robert Joseph, Rosa, Massimiliano, Zamora, Richard James, Park, Eun Jung, Nadiga, Balasubramanya T., Liu, Jason, Ahmed, Kishwar, and Obaida, Mohammad Abu. Performance Prediction Toolkit. Computer software. https://www.osti.gov//servlets/purl/1401959. Vers. 00. USDOE. 25 Sep. 2017. Web.
Chennupati, Gopinath, Santhi, Nanadakishore, Eidenbenz, Stephen, Zerr, Robert Joseph, Rosa, Massimiliano, Zamora, Richard James, Park, Eun Jung, Nadiga, Balasubramanya T., Liu, Jason, Ahmed, Kishwar, & Obaida, Mohammad Abu. (2017, September 25). Performance Prediction Toolkit (Version 00) [Computer software]. https://www.osti.gov//servlets/purl/1401959.
Chennupati, Gopinath, Santhi, Nanadakishore, Eidenbenz, Stephen, Zerr, Robert Joseph, Rosa, Massimiliano, Zamora, Richard James, Park, Eun Jung, Nadiga, Balasubramanya T., Liu, Jason, Ahmed, Kishwar, and Obaida, Mohammad Abu. Performance Prediction Toolkit. Computer software. Version 00. September 25, 2017. https://www.osti.gov//servlets/purl/1401959.
@misc{osti_1401959,
title = {Performance Prediction Toolkit, Version 00},
author = {Chennupati, Gopinath and Santhi, Nanadakishore and Eidenbenz, Stephen and Zerr, Robert Joseph and Rosa, Massimiliano and Zamora, Richard James and Park, Eun Jung and Nadiga, Balasubramanya T. and Liu, Jason and Ahmed, Kishwar and Obaida, Mohammad Abu},
abstractNote = {The Performance Prediction Toolkit (PPT), is a scalable co-design tool that contains the hardware and middle-ware models, which accept proxy applications as input in runtime prediction. PPT relies on Simian, a parallel discrete event simulation engine in Python or Lua, that uses the process concept, where each computing unit (host, node, core) is a Simian entity. Processes perform their task through message exchanges to remain active, sleep, wake-up, begin and end. The PPT hardware model of a compute core (such as a Haswell core) consists of a set of parameters, such as clock speed, memory hierarchy levels, their respective sizes, cache-lines, access times for different cache levels, average cycle counts of ALU operations, etc. These parameters are ideally read off a spec sheet or are learned using regression models learned from hardware counters (PAPI) data. The compute core model offers an API to the software model, a function called time_compute(), which takes as input a tasklist. A tasklist is an unordered set of ALU, and other CPU-type operations (in particular virtual memory loads and stores). The PPT application model mimics the loop structure of the application and replaces the computational kernels with a call to the hardware model's time_compute() function giving tasklists as input that model the compute kernel. A PPT application model thus consists of tasklists representing kernels and the high-er level loop structure that we like to think of as pseudo code. The key challenge for the hardware model's time_compute-function is to translate virtual memory accesses into actual cache hierarchy level hits and misses.PPT also contains another CPU core level hardware model, Analytical Memory Model (AMM). The AMM solves this challenge soundly, where our previous alternatives explicitly include the L1,L2,L3 hit-rates as inputs to the tasklists. Explicit hit-rates inevitably only reflect the application modeler's best guess, perhaps informed by a few small test problems using hardware counters; also, hard-coded hit-rates make the hardware model insensitive to changes in cache sizes. Alternatively, we use reuse distance distributions in the tasklists. In general, reuse profiles require the application modeler to run a very expensive trace analysis on the real code that realistically can be done at best for small examples.},
url = {https://www.osti.gov//servlets/purl/1401959},
doi = {},
year = {Mon Sep 25 00:00:00 EDT 2017},
month = {Mon Sep 25 00:00:00 EDT 2017},
note =
}

Software:
To order this software, request consultation services, or receive further information, please fill out the following request.

Save / Share:

To initiate an order for this software, request consultation services, or receive further information, fill out the request form below. You may also reach us by email at: .

OSTI staff will begin to process an order for scientific and technical software once the payment and signed site license agreement are received. If the forms are not in order, OSTI will contact you. No further action will be taken until all required information and/or payment is received. Orders are usually processed within three to five business days.

Software Request

(required)
(required)
(required)
(required)
(required)
(required)
(required)
(required)