DOE Data Explorer title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data augmentation for disruption prediction via robust surrogate models

Abstract

The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.

Authors:
; ; ; ; ; ; ;
  1. OSTI
Publication Date:
DOE Contract Number:  
SC0014264; FC02-04ER54698
Research Org.:
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center; General Atomics, San Diego, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Fusion Energy Sciences (FES)
Subject:
97 MATHEMATICS AND COMPUTING
OSTI Identifier:
1887951
DOI:
https://doi.org/10.7910/DVN/FMJCAD

Citation Formats

Rath, Katharina, Rügamer, David, Bischl, Bernd, von Toussaint, Udo, Rea, Cristina, Maris, Andrew, Granetz, Robert, and Albert, Christopher G. Data augmentation for disruption prediction via robust surrogate models. United States: N. p., 2022. Web. doi:10.7910/DVN/FMJCAD.
Rath, Katharina, Rügamer, David, Bischl, Bernd, von Toussaint, Udo, Rea, Cristina, Maris, Andrew, Granetz, Robert, & Albert, Christopher G. Data augmentation for disruption prediction via robust surrogate models. United States. doi:https://doi.org/10.7910/DVN/FMJCAD
Rath, Katharina, Rügamer, David, Bischl, Bernd, von Toussaint, Udo, Rea, Cristina, Maris, Andrew, Granetz, Robert, and Albert, Christopher G. 2022. "Data augmentation for disruption prediction via robust surrogate models". United States. doi:https://doi.org/10.7910/DVN/FMJCAD. https://www.osti.gov/servlets/purl/1887951. Pub date:Mon Jun 06 04:00:00 UTC 2022
@article{osti_1887951,
title = {Data augmentation for disruption prediction via robust surrogate models},
author = {Rath, Katharina and Rügamer, David and Bischl, Bernd and von Toussaint, Udo and Rea, Cristina and Maris, Andrew and Granetz, Robert and Albert, Christopher G.},
abstractNote = {The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.},
doi = {10.7910/DVN/FMJCAD},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jun 06 04:00:00 UTC 2022},
month = {Mon Jun 06 04:00:00 UTC 2022}
}