Cluster, classify, regress: A general method for learning discontinuous functions
Abstract
This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. Here, it is proposed to solve this problem in three stages: (i) cluster the pairs of inputoutput data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original inputoutput pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.
 Authors:

 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Fusion Energy Division
 Univ. of Manchester (United Kingdom). Dept. of Mathematics
 Publication Date:
 Research Org.:
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
 Sponsoring Org.:
 USDOE Office of Science (SC), Fusion Energy Sciences (FES); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
 OSTI Identifier:
 1648947
 Grant/Contract Number:
 AC0500OR22725
 Resource Type:
 Accepted Manuscript
 Journal Name:
 Foundations of Data Science
 Additional Journal Information:
 Journal Volume: 1; Journal Issue: 4; Journal ID: ISSN 26398001
 Publisher:
 AIMS
 Country of Publication:
 United States
 Language:
 English
 Subject:
 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; machine learning, discontinuous functions, supervised learning
Citation Formats
E. Bernholdt, David, Cianciosa, Mark R., L. Green, David, M. Park, Jin, J. H. Law, Kody, and Etienam, Clement. Cluster, classify, regress: A general method for learning discontinuous functions. United States: N. p., 2019.
Web. https://doi.org/10.3934/fods.2019020.
E. Bernholdt, David, Cianciosa, Mark R., L. Green, David, M. Park, Jin, J. H. Law, Kody, & Etienam, Clement. Cluster, classify, regress: A general method for learning discontinuous functions. United States. https://doi.org/10.3934/fods.2019020
E. Bernholdt, David, Cianciosa, Mark R., L. Green, David, M. Park, Jin, J. H. Law, Kody, and Etienam, Clement. Sun .
"Cluster, classify, regress: A general method for learning discontinuous functions". United States. https://doi.org/10.3934/fods.2019020. https://www.osti.gov/servlets/purl/1648947.
@article{osti_1648947,
title = {Cluster, classify, regress: A general method for learning discontinuous functions},
author = {E. Bernholdt, David and Cianciosa, Mark R. and L. Green, David and M. Park, Jin and J. H. Law, Kody and Etienam, Clement},
abstractNote = {This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. Here, it is proposed to solve this problem in three stages: (i) cluster the pairs of inputoutput data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original inputoutput pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.},
doi = {10.3934/fods.2019020},
journal = {Foundations of Data Science},
number = 4,
volume = 1,
place = {United States},
year = {2019},
month = {12}
}