skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Cluster, classify, regress: A general method for learning discontinuous functions

Abstract

This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. Here, it is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [2];  [3];  [3]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Fusion Energy Division
  3. Univ. of Manchester (United Kingdom). Dept. of Mathematics
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Fusion Energy Sciences (FES); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1648947
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Foundations of Data Science
Additional Journal Information:
Journal Volume: 1; Journal Issue: 4; Journal ID: ISSN 2639-8001
Publisher:
AIMS
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; machine learning, discontinuous functions, supervised learning

Citation Formats

E. Bernholdt, David, Cianciosa, Mark R., L. Green, David, M. Park, Jin, J. H. Law, Kody, and Etienam, Clement. Cluster, classify, regress: A general method for learning discontinuous functions. United States: N. p., 2019. Web. https://doi.org/10.3934/fods.2019020.
E. Bernholdt, David, Cianciosa, Mark R., L. Green, David, M. Park, Jin, J. H. Law, Kody, & Etienam, Clement. Cluster, classify, regress: A general method for learning discontinuous functions. United States. https://doi.org/10.3934/fods.2019020
E. Bernholdt, David, Cianciosa, Mark R., L. Green, David, M. Park, Jin, J. H. Law, Kody, and Etienam, Clement. Sun . "Cluster, classify, regress: A general method for learning discontinuous functions". United States. https://doi.org/10.3934/fods.2019020. https://www.osti.gov/servlets/purl/1648947.
@article{osti_1648947,
title = {Cluster, classify, regress: A general method for learning discontinuous functions},
author = {E. Bernholdt, David and Cianciosa, Mark R. and L. Green, David and M. Park, Jin and J. H. Law, Kody and Etienam, Clement},
abstractNote = {This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. Here, it is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.},
doi = {10.3934/fods.2019020},
journal = {Foundations of Data Science},
number = 4,
volume = 1,
place = {United States},
year = {2019},
month = {12}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: