skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Adaptive Learning for Concept Drift in Application Performance Modeling

Abstract

Supervised learning is a promising approach for modeling the performance of applications running on large HPC systems. A key assumption in supervised learning is that the training and testing data are obtained under the same conditions. However, in production HPC systems these conditions might not hold because the conditions of the platform can change over time as a result of hardware degradation, hardware replacement, software upgrade, and configuration updates. These changes could alter the data distribution in a way that affects the accuracy of the predictive performance models and render them less useful; this phenomenon is referred to as concept drift. Ignoring concept drift can lead to suboptimal resource usage and decreased efficiency when those performance models are deployed for tuning and job scheduling in production systems. To address this issue, we propose a concept-drift-aware predictive modeling approach that comprises two components: (1) an online Bayesian changepoint detection method that can automatically identify the location of events that lead to concept drift in near-real time and (2) a moment-matching transformation inspired by transfer learning that converts the training data collected before the drift to be useful for retraining. We use application input/output performance data collected on Cori, a production supercomputingmore » system at the National Energy Research Scientific Computing Center, to demonstrate the effectiveness of our approach. The results show that concept-drift-aware models obtain significant improvement in accuracy; the median absolute error of the best-performing Gaussian process regression improved by 58.8% when the proposed approaches were used.« less

Authors:
; ; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science - Office of Advanced Scientific Computing Research - Scientific Discovery through Advanced Computing (SciDAC)
OSTI Identifier:
1574301
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 48th International Conference on Parallel Processing, 08/05/19 - 08/08/19, Kyoto, JP
Country of Publication:
United States
Language:
English
Subject:
HPC performance modeling; I/O performance models; adaptive learning; concept drift; online change point detection; temporal learning

Citation Formats

Madireddy, Sandeep, Balaprakash, Prasanna, Carns, Philip, Latham, Robert, Lockwood, Glenn K., Ross, Robert, Snyder, Shane, and Wild, Stefan M. Adaptive Learning for Concept Drift in Application Performance Modeling. United States: N. p., 2019. Web. doi:10.1145/3337821.3337922.
Madireddy, Sandeep, Balaprakash, Prasanna, Carns, Philip, Latham, Robert, Lockwood, Glenn K., Ross, Robert, Snyder, Shane, & Wild, Stefan M. Adaptive Learning for Concept Drift in Application Performance Modeling. United States. doi:10.1145/3337821.3337922.
Madireddy, Sandeep, Balaprakash, Prasanna, Carns, Philip, Latham, Robert, Lockwood, Glenn K., Ross, Robert, Snyder, Shane, and Wild, Stefan M. Tue . "Adaptive Learning for Concept Drift in Application Performance Modeling". United States. doi:10.1145/3337821.3337922.
@article{osti_1574301,
title = {Adaptive Learning for Concept Drift in Application Performance Modeling},
author = {Madireddy, Sandeep and Balaprakash, Prasanna and Carns, Philip and Latham, Robert and Lockwood, Glenn K. and Ross, Robert and Snyder, Shane and Wild, Stefan M.},
abstractNote = {Supervised learning is a promising approach for modeling the performance of applications running on large HPC systems. A key assumption in supervised learning is that the training and testing data are obtained under the same conditions. However, in production HPC systems these conditions might not hold because the conditions of the platform can change over time as a result of hardware degradation, hardware replacement, software upgrade, and configuration updates. These changes could alter the data distribution in a way that affects the accuracy of the predictive performance models and render them less useful; this phenomenon is referred to as concept drift. Ignoring concept drift can lead to suboptimal resource usage and decreased efficiency when those performance models are deployed for tuning and job scheduling in production systems. To address this issue, we propose a concept-drift-aware predictive modeling approach that comprises two components: (1) an online Bayesian changepoint detection method that can automatically identify the location of events that lead to concept drift in near-real time and (2) a moment-matching transformation inspired by transfer learning that converts the training data collected before the drift to be useful for retraining. We use application input/output performance data collected on Cori, a production supercomputing system at the National Energy Research Scientific Computing Center, to demonstrate the effectiveness of our approach. The results show that concept-drift-aware models obtain significant improvement in accuracy; the median absolute error of the best-performing Gaussian process regression improved by 58.8% when the proposed approaches were used.},
doi = {10.1145/3337821.3337922},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: