skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing

Technical Report ·
DOI:https://doi.org/10.2172/1529499· OSTI ID:1529499

The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics, hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters with respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universitaet Darmstadt (TU Darmstadt) starting February 1st, 2015, the work of GRS was continued at TU Darmstadt. This report covers the work at TU Darmstadt after the transition. The first main accomplishment of TU Darmstadt is the development of different techniques to aggregate performance data on the level of threads. TU Darmstadt evaluated different schemes designed at GRS during the first phase of this project and integrated them into Score-P, a widely used performance measurement framework for HPC applications. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving its internal representation of the underlying system. Third, we extended our performance-modeling tool Extra-P to automatically create models with more than one input parameter (e.g., process count and input size), which was previously impossible. These models allow the performance behavior of an application to be extrapolated to larger machines. Specific applications of the new multi-parameter capability realized in this project include modeling the isoefficiency function of task-based applications and a novel co-design methodology. Further extensions of Extra-P comprise online modeling functionality, automatic refinement of the model search space, and segmented performance modeling. Further contributions include new locality metrics for both shared- and distributed-memory applications, a new way of measuring the impact of inter-application inference on the execution time of an application, and finally an automatic method that selects appropriate OpenMP constructs for the parallelization of sequential code and classifies the affected variables based on their sharing semantics.

Research Organization:
Technische Universität Darmstadt (Germany)
Sponsoring Organization:
USDOE Office of Science (SC)
Contributing Organization:
Jülich Supercomputing Centre
DOE Contract Number:
SC0015524
OSTI ID:
1529499
Report Number(s):
DOE-TUDA-15524
Country of Publication:
United States
Language:
English

References (22)

Understanding the Scalability of Molecular Simulation Using Empirical Performance Modeling book April 2019
Characterizing Loop-Level Communication Patterns in Shared Memory conference September 2015
Using Deep Learning for Automated Communication Pattern Characterization: Little Steps and Big Challenges book April 2019
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir book January 2012
Fast Multi-parameter Performance Modeling conference September 2016
An Efficient Data-Dependence Profiler for Sequential and Parallel Programs conference May 2015
Lightweight Requirements Engineering for Exascale Co-design conference September 2018
Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics conference January 2018
Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations journal August 2019
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP conference September 2009
The Tau Parallel Performance System journal May 2006
Following the Blind Seer – Creating Better Performance Models Using Less Information book August 2017
Preventing the explosion of exascale profile data with smart thread-level aggregation conference January 2015
Scalable Algorithms for Constructing Balanced Spanning Trees on System-Ranked Process Groups book January 2012
The Scalasca performance toolset architecture journal January 2010
Off-Road Performance Modeling – How to Deal with Segmented Data book August 2017
Estimating the Impact of External Interference on Application Performance book August 2018
Scaling Score-P to the next level * *This material is based upon work supported by the US Department of Energy under Grant No. DE-SC0015524 and by the German Federal Ministry for Education and Research (BMBF) under Grant No. 01IH13001. journal January 2017
Unveiling parallelization opportunities in sequential programs journal July 2016
Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications
  • Shudler, Sergei; Calotoiu, Alexandru; Hoefler, Torsten
  • Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '17 https://doi.org/10.1145/3018743.3018770
conference January 2017
Exascale Algorithms for Generalized MPI_Comm_split book January 2011
Using automated performance modeling to find scalability bugs in complex codes
  • Calotoiu, Alexandru; Hoefler, Torsten; Poke, Marius
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503277
conference January 2013