skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Diagnosing Performance Variations in HPC Architectures Using Machine Learning.


Abstract not provided.

; ; ; ; ; ; ;
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Proposed for presentation at the ISC HPC 2017 held June 18-22, 2017 in Frankfurt, Germany.
Country of Publication:
United States

Citation Formats

Tuncer, Ozan, Ates, Emre, Zhang, Yijia, Turk, Ata, Brandt, James M., Leung, Vitus J., Egele, Manuel, and Coskun, Ayse K. Diagnosing Performance Variations in HPC Architectures Using Machine Learning.. United States: N. p., 2017. Web.
Tuncer, Ozan, Ates, Emre, Zhang, Yijia, Turk, Ata, Brandt, James M., Leung, Vitus J., Egele, Manuel, & Coskun, Ayse K. Diagnosing Performance Variations in HPC Architectures Using Machine Learning.. United States.
Tuncer, Ozan, Ates, Emre, Zhang, Yijia, Turk, Ata, Brandt, James M., Leung, Vitus J., Egele, Manuel, and Coskun, Ayse K. Wed . "Diagnosing Performance Variations in HPC Architectures Using Machine Learning.". United States. doi:.
title = {Diagnosing Performance Variations in HPC Architectures Using Machine Learning.},
author = {Tuncer, Ozan and Ates, Emre and Zhang, Yijia and Turk, Ata and Brandt, James M. and Leung, Vitus J. and Egele, Manuel and Coskun, Ayse K.},
abstractNote = {Abstract not provided.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Mar 01 00:00:00 EST 2017},
month = {Wed Mar 01 00:00:00 EST 2017}

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Most researchers with little high performance computing (HPC) experience have difficulties productively using the supercomputing resources. To address this issue, we investigated usage behaviors of the world s fastest academic Kraken supercomputer, and built a knowledge-based recommendation system to improve user productivity. Six clustering techniques, along with three cluster validation measures, were implemented to investigate the underlying patterns of usage behaviors. Besides manually defining a category for very large job submissions, six behavior categories were identified, which cleanly separated the data intensive jobs and computational intensive jobs. Then, job statistics of each behavior category were used to develop a knowledge-basedmore » recommendation system that can provide users with instructions about choosing appropriate software packages, setting job parameter values, and estimating job queuing time and runtime. Experiments were conducted to evaluate the performance of the proposed recommendation system, which included 127 job submissions by users from different research fields. Great feedback indicated the usefulness of the provided information. The average runtime estimation accuracy of 64.2%, with 28.9% job termination rate, was achieved in the experiments, which almost doubled the average accuracy in the Kraken dataset.« less
  • Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
  • Wind turbine power output is known to be a strong function of wind speed, but is also affected by turbulence and shear. In this work, new aerostructural simulations of a generic 1.5 MW turbine are used to explore atmospheric influences on power output. Most significant is the hub height wind speed, followed by hub height turbulence intensity and then wind speed shear across the rotor disk. These simulation data are used to train regression trees that predict the turbine response for any combination of wind speed, turbulence intensity, and wind shear that might be expected at a turbine site. Formore » a randomly selected atmospheric condition, the accuracy of the regression tree power predictions is three times higher than that of the traditional power curve methodology. The regression tree method can also be applied to turbine test data and used to predict turbine performance at a new site. No new data is required in comparison to the data that are usually collected for a wind resource assessment. Implementing the method requires turbine manufacturers to create a turbine regression tree model from test site data. Such an approach could significantly reduce bias in power predictions that arise because of different turbulence and shear at the new site, compared to the test site.« less
  • Abstract not provided.
  • The rapid advances in the development of low-cost computer hardware have led to many proposals for the use of this hardware to improve the performance of database management systems. Usually the design proposals are quite vague about the performance of the system with respect to a given data management application. This paper develops an analytical model of the performance of a conventional database management system and four generic database machine architectures. This model is then used to compare the performance of each type of machine with a conventional DBMS. It is demonstrated that no one type of database machine ismore » best for executing all types of queries. It is also shown that for several classes of queries certain database machine designs which have been proposed are actually slower than a DBMS on a conventional processor.« less