skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing

Abstract

The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics, hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters withmore » respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universitaet Darmstadt (TU Darmstadt) starting February 1st, 2015, the work of GRS was continued at TU Darmstadt. This report covers the work at TU Darmstadt after the transition. The first main accomplishment of TU Darmstadt is the development of different techniques to aggregate performance data on the level of threads. TU Darmstadt evaluated different schemes designed at GRS during the first phase of this project and integrated them into Score-P, a widely used performance measurement framework for HPC applications. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving its internal representation of the underlying system. Third, we extended our performance-modeling tool Extra-P to automatically create models with more than one input parameter (e.g., process count and input size), which was previously impossible. These models allow the performance behavior of an application to be extrapolated to larger machines. Specific applications of the new multi-parameter capability realized in this project include modeling the isoefficiency function of task-based applications and a novel co-design methodology. Further extensions of Extra-P comprise online modeling functionality, automatic refinement of the model search space, and segmented performance modeling. Further contributions include new locality metrics for both shared- and distributed-memory applications, a new way of measuring the impact of inter-application inference on the execution time of an application, and finally an automatic method that selects appropriate OpenMP constructs for the parallelization of sequential code and classifies the affected variables based on their sharing semantics.« less

Authors:
ORCiD logo [1];  [1]
  1. Technische Universitaet Darmstadt
Publication Date:
Research Org.:
Technische Universitaet Darmstadt
Sponsoring Org.:
USDOE Office of Science (SC)
Contributing Org.:
Juelich Supercomputing Centre
OSTI Identifier:
1529499
Report Number(s):
DOE-TUDA-15524
DOE Contract Number:  
SC0015524
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; high-performance computing, exascale, scalability, performance measurement, performance modelling

Citation Formats

Wolf, Felix, and Lorenz, Daniel. PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing. United States: N. p., 2019. Web. doi:10.2172/1529499.
Wolf, Felix, & Lorenz, Daniel. PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing. United States. doi:10.2172/1529499.
Wolf, Felix, and Lorenz, Daniel. Thu . "PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing". United States. doi:10.2172/1529499. https://www.osti.gov/servlets/purl/1529499.
@article{osti_1529499,
title = {PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing},
author = {Wolf, Felix and Lorenz, Daniel},
abstractNote = {The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics, hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters with respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universitaet Darmstadt (TU Darmstadt) starting February 1st, 2015, the work of GRS was continued at TU Darmstadt. This report covers the work at TU Darmstadt after the transition. The first main accomplishment of TU Darmstadt is the development of different techniques to aggregate performance data on the level of threads. TU Darmstadt evaluated different schemes designed at GRS during the first phase of this project and integrated them into Score-P, a widely used performance measurement framework for HPC applications. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving its internal representation of the underlying system. Third, we extended our performance-modeling tool Extra-P to automatically create models with more than one input parameter (e.g., process count and input size), which was previously impossible. These models allow the performance behavior of an application to be extrapolated to larger machines. Specific applications of the new multi-parameter capability realized in this project include modeling the isoefficiency function of task-based applications and a novel co-design methodology. Further extensions of Extra-P comprise online modeling functionality, automatic refinement of the model search space, and segmented performance modeling. Further contributions include new locality metrics for both shared- and distributed-memory applications, a new way of measuring the impact of inter-application inference on the execution time of an application, and finally an automatic method that selects appropriate OpenMP constructs for the parallelization of sequential code and classifies the affected variables based on their sharing semantics.},
doi = {10.2172/1529499},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {6}
}

Works referenced in this record:

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations
journal, August 2019

  • Shudler, Sergei; Berens, Yannick; Calotoiu, Alexandru
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 30, Issue 8
  • DOI: 10.1109/TPDS.2019.2896993

The Tau Parallel Performance System
journal, May 2006

  • Shende, Sameer S.; Malony, Allen D.
  • The International Journal of High Performance Computing Applications, Vol. 20, Issue 2
  • DOI: 10.1177/1094342006064482

The Scalasca performance toolset architecture
journal, January 2010

  • Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
  • Concurrency and Computation: Practice and Experience
  • DOI: 10.1002/cpe.1556

Unveiling parallelization opportunities in sequential programs
journal, July 2016